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Preface 



Multi-criterion optimization deals with multiple, often conflicting objectives 
which naturally arise in a real-world scenario. The held of multiple criteria de- 
cision making (MCDM) is well established, investigated by many researchers 
and scientists, and widely applied in practice. Unlike in single-objective opti- 
mization, a multi-criterion optimization problem gives rise to a number of op- 
timal solutions, known as Pareto-optimal solutions, of which none can be said 
to be better than the others with respect to all objectives. Thus, one of the 
primary goals in multi-criterion optimization is to And or to approximate the set 
of Pareto-optimal solutions. Since evolutionary algorithms work with a popula- 
tion of solutions, they have been used in multi-criterion optimization for more 
than a decade. To date, there exist a number of evolutionary approaches and 
application case studies, demonstrating the usefulness and efficiency of evolutio- 
nary multi-criterion optimization (EMO). Due to the growing interest in EMO, 
the general chairs envisaged organizing this first-ever international conference 
covering all aspects of the intersection of evolutionary computation and classical 
MCDM. The aim was to promote and share research activities in this promising 
held. 

The first international conference on evolutionary multi-criterion optimiza- 
tion (EMO 2001) was held in Zurich at the Swiss Federal Institute of Tech- 
nology (ETH) on March 7-9, 2001. This event included two keynote speeches, 
one delivered by Ralph E. Steuer on current state-of-the-art methodology and 
the other delivered by Ian C. Parmee on real-world applications of evolutionary 
techniques. Furthermore, two extended tutorials were presented, one on classical 
multiple criteria decision making methodologies by Kaisa Miettinen and another 
one on evolutionary algorithms by Carlos A. Coello Coello. 

In response to the call for papers, 87 papers from 27 countries were submitted, 
each of which was independently reviewed by at least three members of the 
program committee. This volume presents a selection of 45 of the refereed papers, 
together with contributions based on the invited talks and tutorials. 

We would like to express our appreciation to the keynote speakers who ac- 
cepted our invitation, to the tutorial organizers, to all authors who submitted 
papers to EMO 2001, and to Marco Laumanns and Monica Fricker for their 
invaluable help in organizing the conference. 
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Some Methods for 

Nonlinear Multi-objective Optimization 



Kaisa Miettinen* 

University of Jyvaskyla, Department of Mathematical Information Technology, 
P.O. Box 35 (Agora), FIN-40351 Jyvaskyla, Finland 
miett ineSmit .jyu.fi 

WWW home page: http://www.mit.jyu.fi/miettine/engl.html 



Abstract. A general overview of nonlinear multiobjective optimization 
methods is given. The basic features of several methods are introduced so 
that an appropriate method could be found for different purposes. The 
methods are classified according to the role of a decision maker in the 
solution process. The main emphasis is devoted to interactive methods 
where the decision maker progressively provides preference information 
so that the most satisfactory solution can be found. 



1 Introduction 

Multiple criteria decision making (MCDM) problems form an extensive field 
where the best possible compromise should be found by evaluating several con- 
flicting objectives. There is a good reason to classify such problems on the ba- 
sis of their different characteristics. Here we concentrate on problems involving 
continuous nonlinear functions with deterministic values. We present versatile 
methods for solving such problems. 

The solution process usually requires the participation of a human decision 
maker (DM) who can give preference information related to conflicting goals. 
Here we assume that a single DM is involved. 

Methods are divided into four classes according to the role of the DM. Either 
no DM takes part in the solution process or (s)he expresses preference relations 
before, after or during the process. The last-mentioned, interactive, methods 
form the most extensive class of methods. 

Multiobjective optimization problems are usually solved by scalarization. 
Scalarization means that the problem is converted into one single or a family of 
single objective optimization problems. This new problem has a real-valued ob- 
jective function that possibly depends on some parameters and it can be solved 
using single objective optimizers. 

Further information about the methodology of deterministic multiobjective 
optimization can be found, e.g., in the monographs [6,12,14,20,27,30]. For a more 
detailed presentation of the methods treated here as well as other related topics 
we refer to [20] and references therein. 

* This research was supported by the Academy of Finland, grant #65760. 
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2 Concepts and Background 

A multiobjective optimization problem is of the form 

minimize {/i(a;), / 2 (a;), . . . , /fe(x)} 

subject to a; e S' ' 

involving k (> 2) conflicting objective functions fi : M" ^ IR that we want to 
minimize simultaneously. The decision (variable) vectors x = {x\,X2, ■ ■ ■ ,XnY' 
belong to the (nonempty) feasible region S C M". The feasible region is formed 
by constraint functions but we do not fix them here. 

We denote the image of the feasible region by Z C and call it a feasible 
objective region. The elements of Z are called objective vectors and they consist 
of objective (function) values f{x) = (fi{x),f 2 {x), . . . , fk{x))'^. Note that if fi 
is to be maximized, it is equivalent to minimize — fi- 

In what follows, a function is called nondifferentiable if it is locally Lips- 
chitzian (and not necessarily continuously differentiable). 

Definition 1. When all the objective and the constraint functions are linear, 
the problem is called linear or an MOLP problem. If at least one of the functions 
is nonlinear, the problem is a nonlinear multiobjective optimization problem. 
Correspondingly, the problem is nondifferentiable is some of the functions is 
nondifferentiable and convex if all the objective functions and the feasible region 
are convex. 

Because of the contradiction and possible incommensurability of the objective 
functions, it is not possible to find a single solution that would optimize all the 
objectives simultaneously. In multiobjective optimization, vectors are regarded 
as optimal if their components cannot be improved without deterioration to at 
least one of the other components. This is usually called Pareto optimality. 

Definition 2. A decision vector x* G S is Pareto optimal if there does not exist 
another x G S such that fi{x) < fi{x*) for all i = 1, . . . , fc and fj{x) < fj{x*) 
for at least one index j. An objective vector is Pareto optimal if the corresponding 
decision vector is Pareto optimal. 

There are usually a lot (infinite number) of Pareto optimal solutions and 
they form a set of Pareto optimal solutions or a Pareto optimal set. This set can 
be nonconvex and nonconnected. 

Definition 2 introduces global Pareto optimality . Another important concept 
is local Pareto optimality defined in a small environment of the point considered. 
Naturally, any globally Pareto optimal solution is locally Pareto optimal. The 
converse is valid for convex problems. To be more specific, if the feasible region 
is convex and the objective functions are quasiconvex with at least one strictly 
quasiconvex function, then locally Pareto optimal solutions are also globally 
Pareto optimal. 
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Other related optimality concepts are weak and proper Pareto optimality. 
The properly Pareto optimal set is a subset of the Pareto optimal set which is a 
subset of the weakly Pareto optimal set. 

A vector is weakly Pareto optimal if there does not exist any other feasible 
vector for which all the components are better. In other words, when compared 
to Definition 2, all the inequalities are strict. Weakly Pareto optimal solutions 
are often relevant from a technical point of view because they are sometimes 
easier to generate than Pareto optimal points. 

Pareto optimal solutions can be divided into improperly and properly Pareto 
optimal ones according to whether unbounded trade-offs between objectives are 
allowed or not. Proper Pareto optimality can be defined in several ways (see, e.g., 
[20]). According to [9] a solution is properly Pareto optimal if there is at least 
one pair of objectives for which a finite decrement in one objective is possible 
only at the expense of some reasonable increment in the other objective. 

Mathematically, all the Pareto optimal points are equally acceptable solutions 
of the multiobjective optimization problem. However, it is generally desirable to 
obtain one point as a solution. Selecting one out of the set of Pareto optimal 
solutions calls for a decision maker (DM). (S)he is a person who has better insight 
into the problem and who can express preference relations between different 
solutions. 

Finding a solution to (1) is called a solution process. It usually means the 
co-operation of the DM and an analyst. An analyst is a person or a computer 
program responsible for the mathematical side of the solution process. The an- 
alyst generates information for the DM to consider and the solution is selected 
according to the preferences of the DM. 

By solving a multiobjective optimization problem we here mean finding a 
feasible decision vector such that it is Pareto optimal and satisfies the DM. 
Assuming such a solution exists, it is called a final solution. 

The ranges of the Pareto optimal set provide valuable information for the 
solution process if the objective functions are bounded over the feasible region. 
The components z( of the ideal objective vector z* G are obtained by min- 
imizing each of the objective functions individually subject to the constraints. 
The ideal objective vector is not feasible because of the conflict among the objec- 
tives. From the ideal objective vector we obtain the lower bounds of the Pareto 
optimal set. Note that in nonconvex problems we need a global optimizer for 
calculating the ideal objective vector. 

The upper bounds of the Pareto optimal set, that is, the components of a 
nadir objective vector are usually rather difficult to obtain. They can be 

estimated from a payoff table (see, e.g., [20]) but this is not a reliable way as 
can be seen, e.g., in [15,31]. 

For nonlinear problems, there is no constructive method for calculating the 
nadir objective vector. Nonetheless, the payoff table may be used as a rough 
estimate as long as its robustness is kept in mind. Because of the above-described 
difficulty of calculating the actual nadir objective vector, is usually an 

approximation. 
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Sometimes we need a vector that is strictly better than every Pareto optimal 
solution. Such a vector is called a utopian objective vector z** G IR^ and its 
components are formed by decreasing the components of 2 ;* by a positive scalar. 

It is often assumed that the DM makes decisions on the basis of an underlying 
function. This function representing the preferences of the DM is called a value 
function U : ^ IR (see [14]). In many methods, the value (or utility) function 

is assumed to be known implicitly. 

Value functions are important in the development of solution methods and as 
a theoretical background. Generally, the value function is assumed to be strongly 
decreasing. This means that the preference of the DM will increase if the value 
of an objective function decreases while all the other objective values remain 
unchanged (i.e., less is preferred to more). In this case, the maximum of U is 
Pareto optimal. Regardless of the existence of a value function, it is usually 
assumed that less is preferred to more by the DM. 

Instead of as a maximum of the value function, a final solution can be un- 
derstood as a satisficing one. Satisficing decision making means that the DM 
does not intend to maximize any value function but tries to achieve certain as- 
pirations. A solution which satisfies all the aspirations of the DM is called a 
satisficing solution. 

During solution processes, various kinds of information are solicited from the 
DM. Aspiration levels Zi, i = 1, . . . are such desirable or acceptable levels in 
the objective function values that are of special interest and importance to the 
DM. The vector z G IR^ is called a reference point. 

According to the definition of Pareto optimality, moving from one Pareto 
optimal solution to another necessitates trading off. This is one of the basic 
concepts in multiobjective optimization. A trade-off reflects the ratio of change 
in the values of the objective functions concerning the increment of one objective 
function that occurs when the value of some other objective function decreases. 
For details, see, e.g., [6,20]. 

It is said that two feasible solutions are situated on the same indifference 
curve if the DM finds them equally desirable. For any two Pareto optimal so- 
lutions on the same indifference curve there is a trade-off involving a certain 
increment in one objective function value that the DM can tolerate in exchange 
for a certain amount of decrement in some other objective function while the 
preferences of the two solutions remain the same. This is called the marginal 
rate of substitution mijfx*) {i,j = 1, . . . , fc, if=- j). 

To conclude this section, let us have a look at how the Pareto optimality of 
a feasible decision vector can be tested. This topic is investigated, e.g., in [1,20]. 
A decision vector a;* € S' is Pareto optimal if and only if the problem 

minimize G 

subject to fi{x)-\-Si = fi{x*) for all i=l,...,k, 

£i > 0 for all i = 1,. . . , k, 

X G S 



(2) 
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has an optimal objective function value of zero, where both x G M" and e € 
are variables. On the other hand, if (2) has a finite nonzero optimal objective 
function value obtained at x, then x is Pareto optimal. Note that the equalities 
in (2) can be replaced with inequalities. 



3 Methods 

Mathematically, the multiobjective optimization problem is considered to be 
solved when the Pareto optimal set is found. This is also known as vector op- 
timization. However, this is not always enough. Instead, we want to obtain one 
final solution. This means that we must find a way to order the Pareto optimal 
solutions and here we need a DM and her/his preferences. 

In what follows, we present several methods for finding a final solution. We 
cannot cover every existing method but we introduce several philosophies and 
ways of approaching the problem. 

The methods can be classified in many ways. Here we apply the classification 
presented in [12] based on the participation of the DM in the solution process. 
The classes are no-preference methods, a posteriori methods, a priori methods 
and interactive methods. Note that no classification can be complete and over- 
lapping and combinations of classes are possible. 

In addition, we consider an alternative way of classification into ad hoc and 
non ad hoc methods. This division, suggested in [29], is based on the existence 
of an underlying value function. Even if one knew the DM’s value function, one 
would not exactly know how to respond to the questions posed by an ad hoc 
algorithm. On the other hand, in non ad hoc methods the responses can be 
determined or at least confidently simulated with the help of a value function. 

In no-preference methods, the opinions of the DM are not taken into consid- 
eration. Thus, the problem is solved using some relatively simple method and the 
solution is presented to the DM who may either accept or reject it. For details 
if this class see, e.g., [20]. Next we introduce examples of a posteriori, a priori 
and interactive methods. 



4 A Posteriori Methods 

A posteriori methods could also be called methods for generating Pareto optimal 
solutions. After the Pareto optimal set (or a part of it) has been generated, it is 
presented to the DM, who selects the most preferred solution. The inconveniences 
here are that the generation process is usually computationally expensive and 
sometimes in part, at least, difficult. On the other hand, it is hard for the DM 
to select from a large set of alternatives. An important question related to this 
is how to display the alternatives to the DM in an illustrative way. 

If there are only two objective functions, the Pareto optimal set can be gen- 
erated parametrically (see, e.g., [2,8]). The problem becomes more complicated 
with more objectives. 
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4.1 Weighting Method 

In the weighting method (see, e.g. [8,35]), we solve the problem 

minimize (g) 

subject to X G S , 

where Wi > 0 for alH = 1, . . . , fc and '^i = 1- The solution of (3) is weakly 
Pareto optimal and it is Pareto optimal if Wj > 0 for all i = 1, . . . , fc or the 
solution is unique. 

The weakness of the weighting method is that not all of the Pareto optimal 
solutions can be found unless the problem is convex. The conditions under which 
the whole Pareto optimal set can be generated by the weighting method with 
positive weights are presented in [5] . 

Systematic ways of perturbing the weights to obtain different Pareto optimal 
solutions are suggested, e.g., in [6] (pp. 234-236). In addition, an algorithm for 
generating different weights automatically for convex problems to produce an 
approximation of the Pareto optimal set is proposed in [4] . 

The method has several weaknesses. On the one hand, a small change in 
the weights may cause big changes in the objective vectors. On the other hand, 
dramatically different weights may produce nearly similar objective vectors. In 
addition, an evenly distributed set of weights does not necessarily produce an 
evenly distributed representation of the Pareto optimal set. 

4.2 e-Constraint Method 

In the e-constraint method, introduced in [II], one of the objective functions is 
optimized in the form 

minimize fi{x) 

subject to fj{x) < Sj for all j = 1, . . . , k, j yf i, (4) 

X G S , 

where i G {1, ... ,k} and Sj are upper bounds for the objectives j yf i. 

The solution of (4) is weakly Pareto optimal. On the other hand, x* G S is 
Pareto optimal if and only if it solves (4) for every £ = 1, . . . , fc, where ej = fj{x*) 
for j = 1, . . . , fc, j yf £. In addition, the unique solution of (4) is Pareto optimal for 
any upper bounds. Thus, finding any Pareto optimal solution does not necessitate 
convexity. 

In order to ensure Pareto optimality in this method, we have to either solve 
fc different problems or obtain a unique solution. In general, uniqueness is not 
necessarily easy to verify. Systematic ways of perturbing the upper bounds to 
obtain different Pareto optimal solutions are suggested in [6] (pp. 283-295). 

4.3 Method of Weighted Metrics 

In the method of weighted metrics, the distance between some reference point 
and the feasible objective region is minimized. A common way is to use the ideal 
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objective vector and Lp-metrics. We can produce different solutions by weighting 
the metrics. This method is also sometimes called compromise programming . 

The solution obtained depends greatly on the value chosen for p. For 1 < 
p < oo we have a problem 



• • • ^ 

minimize ( - Zj | 1 

subject to X G S . 



( 5 ) 



The exponent 1/p can be dropped. For p = oo we have a weighted Tchebycheff 
problem 

minimize ma,Xi=i^,„^k [wi\fi{x) - z*\] , . 

subject to X G S . ^ ' 

Notice that no absolute values are needed if we know the global ideal objective 
vector. The solution of (5) is Pareto optimal if either the solution is unique or 
all the weights are positive. Furthermore, the solution of (6) is weakly Pareto 
optimal for positive weights. Finally, (6) has at least one Pareto optimal solution. 

Convexity of the problem is needed in order to guarantee that every Pareto 
optimal solution can be found by (5). On the other hand, any Pareto optimal 
solution can be found by (6) when z** is used as a reference point. 

Weakly Pareto optimal solutions can be avoided in (6) by giving a slight slope 
to the contour of the metric (see [27]). The augmented problem to be solved is 

minimize maxi=i„„,fc [wt\fi{x) - z/*\] + pYh=i l/i(®) “ (j) 

subject to X € S , 

where p is a sufficiently small positive scalar. In this case, it may be impossible 
to find every Pareto optimal solution. Instead, (7) generates properly Pareto 
optimal solutions and any properly Pareto optimal solution can be found. 

Let us mention that different connections between the weighting method, the 
£-constraint method and the method of weighted metrics are presented in [17]. 



4.4 Achievement Scalarizing Function Approach 

Scalarizing functions of a special type are called achievement (scalarizing) func- 
tions. They have been introduced by Wierzbicki, e.g., in [32,33]. These functions 
are of the form Sg : Z — > IR, where z G is an arbitrary reference point. Be- 
cause we do not know Z explicitly, in practice we minimize the function Sz{f{x)) 
subject to a: G S'. 

We can define so-called order-representing and order-approximating achieve- 
ment functions. Then we have the following properties for any reference point: If 
the achievement function is order-representing, then its solution is weakly Pareto 
optimal and if the function is order-approximating, then its solution is Pareto 
optimal. On the other hand, any (weakly) Pareto optimal solution can be found 
if the achievement function is order-representing. Thus, weakly Pareto optimal 
or Pareto optimal solutions can be obtained by moving the reference point only. 
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There are many achievement functions satisfying the above-presented condi- 
tions. An example of order-representing functions is Sz(z) = maxi=i_,,,^fc[ — 
Zi)], where w is some fixed positive weighting vector. An example of order- 
approximating achievement functions is 

k 

Sz{z) = max [w,{zi- z,)] + py^Wi{zi- Zi) , (8) 

where w is as above and p > 0. 

5 A Priori Methods 

In a priori methods, the DM must specify her/his preferences, hopes and opinions 
before the solution process. Unfortunately, the DM does not necessarily know 
beforehand what it is possible to attain in the problem and how realistic her/his 
expectations are. 

5.1 Value Function Method 

The value function approach was already mentioned earlier. It is an excellent 
method if the DM happens to know an explicit mathematical formulation for 
the value function and if that function represents wholly her/his preferences. 
Unfortunately, it may be difficult, if not impossible, to get that mathematical 
expression. On the other hand, it can be difficult to optimize because of its 
possible complicated nature. 

Note that the DM’s preferences must satisfy certain conditions so that a 
value function can be defined on them. The DM must, e.g., be able to specify 
consistent preferences. 

5.2 Lexicographic Ordering 

In lexicographic ordering, the DM must arrange the objective functions accord- 
ing to their absolute importance. This ordering means that a more important 
objective is infinitely more important than a less important objective. After 
ordering, the most important objective function is minimized subject to the 
original constraints. If this problem has a unique solution, it is the final one. 
Otherwise, the second most important objective function is minimized. Now, a 
new constraint is added to guarantee that the most important objective function 
preserves its optimal value. If this problem has a unique solution, it is the final 
one. Otherwise, the process goes on. 

The solution of lexicographic ordering is Pareto optimal. The method is quite 
simple and people usually make decisions successively. However, the DM may 
have difficulties in specifying an absolute order of importance. Besides, it is very 
likely that the process stops before less important objective functions are taken 
into consideration. 
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Note that lexicographic ordering does not allow a small increment of an 
important objective function to be traded off with a great decrement of a less 
important objective. Yet, trading off might often be appealing to the DM. 

5.3 Goal Programming 

Goal programming is one of the first methods expressly created for multiobjec- 
tive optimization. The DM is asked to specify aspiration levels Zi {i = 1,...,A:) 
for the objective functions and deviations from these aspiration levels are min- 
imized. An objective function jointly with an aspiration level forms a goal. For 
minimization problems, goals are of the form fi{x) < Zi. Here the aspiration 
levels are assumed to be selected so that they are not achievable simultaneously. 
Next, the overachievements Si of the objective function values are minimized. 

The method has several variants. In the weighted approach, see [7], the 
weighted sum of the deviational variables is minimized. This means that in 
addition to the aspiration levels, the DM must specify positive weights. Then we 
have a problem 

minimize 

subject to fi{x) — Si < Zi for all i = 1, . . . , fc, /q, 

> 0 for all z = 1, . . . , k, 

X G S , 

where x G IR” and Si (z = 1, . . . , fc) are the variables and Si = max; [0, fi{x) — Zi]. 

In the lexicographic approach, the DM must specify a lexicographic order for 
the goals in addition to the aspiration levels. After the lexicographic ordering, the 
problem with the deviational variables as objective functions is solved subject 
to the constraints of (9) as explained in Sect. 5.2. 

A combination of the weighted and the lexicographic approaches is quite 
popular. In this case, several objective functions may belong to the same class 
of importance in the lexicographic order. In each priority class, a weighted sum 
of the deviational variables is minimized. 

The solution of a weighted or a lexicographic goal programming problem is 
Pareto optimal if either the aspiration levels form a Pareto optimal reference 
point or all the deviational variables Si have positive values at the optimum. In 
other words, if the aspiration levels are all feasible, the solution is equal to the 
reference point that is not necessarily Pareto optimal. 

Goal programming is a very widely used and popular solution method. Goal- 
setting is an understandable and easy way of making decisions. The specification 
of the weights or the lexicographic ordering may be more difficult. It may also 
be hard to specify weights because they have no direct physical meaning. 

6 Interactive Methods 

The extensive interest devoted to interactive methods can be explained by the 
fact that assuming the DM has enough time and capabilities for co-operation. 
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interactive methods can be presumed to produce the most satisfactory results. 
Many of the weak points of the methods in the other method classes are over- 
come. Namely, only part of the Pareto optimal points has to be generated and 
evaluated, the DM is not overloaded with information, and the DM can specify 
and correct her/his preferences and selections as the solution process continues 
and (s)he gets to know the problem and its potentialities better. This also means 
that the DM does not have to know any global preference structure. In addition, 
the DM can be assumed to have more confidence in the final solution since (s)he 
is involved throughout the solution process. 

In interactive methods, the DM works together with an analyst or an inter- 
active computer program. One can say that the analyst tries to determine the 
preference structure of the DM in an interactive way. After every iteration, some 
information is given to the DM and (s)he is asked to answer some questions or 
provide some other type of information. 

Interactive methods differ from each other by the form in which information 
is given to the DM, by the form in which information is provided by the DM, and 
how the problem is transformed into a single objective optimization problem. It 
is always important that the DM finds the method worthwhile and acceptable 
and is able to use it properly. 

There are three main stopping criteria in interactive methods. Either the DM 
finds a desirable solution and is convinced that it is preferred to all the other 
Pareto optimal solutions (see [16]), some algorithmic stopping or convergence 
rule is fulfilled or the DM gets tired of the solution process. 

The number of interactive methods is large. Here we briefly describe some of 
them. In all the methods, less is assumed to be preferred to more by the DM. 
For more details, see [20] and references therein. 



6.1 Geoffrion-Dyer-Feinberg Method 

The Geoffrion-Dyer-Feinberg (GDF) method, proposed in [10], is one of the 
most well-known interactive methods and it is based on the maximization of 
the underlying (implicitly known) value function. The objective functions are 
assumed to be continuously differentiable and the feasible region S must be 
compact and convex. 

Marginal rates of substitution specified by the DM at the current point are 
here used to approximate the direction of steepest ascent of the value function. 
We have mij{x^) = rrii = j ■ Then the approximation is 

optimized by the method of Frank and Wolfe by solving the problem 

y&s 

with y G IR” being the variable. 




minimize 
subject to 
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The basic phases of the GDF algorithm are the following. 

1. Ask the DM to specify a reference function. 

2. Ask the DM to specify marginal rates of substitution between the reference 
function and the other objectives at the current solution point. 

3. Solve (10). Set the search direction as the difference between the old (i.e. cur- 
rent) and the new solution. 

4. Determine with the help of the DM the appropriate step-size to be taken in 
the direction. 

5. If the DM wants to continue, go to step (2). Otherwise, stop. 

When determining the step-size, the DM is asked to select the most preferred 
objective vector obtained with different step-sizes taken in the search direction. 
Note that the alternatives are not necessarily Pareto optimal. It is obvious that 
the task of selection becomes more difficult for the DM as the number of objective 
functions increases. 

The GDF method can be characterized to be a non ad hoc method. If one 
knows the value function, it is easy to specify the marginal rates of substitution 
and select the best alternative. In spite of the plausible theoretical foundation 
of the GDF method, it is not so convincing and powerful in practice. The most 
important difficulty for the DM is the determining of the k — 1 marginal rates 
of substitution at each iteration. Even more difficult is to give consistent and 
correct marginal rates of substitution at every iteration. 

6.2 Tchebycheff Method 

The Tchebycheff method, presented in [27] (pp. 419-450) and refined in [28], is 
an interactive weighting vector space reduction method where value functions 
are not used. The method has been designed to be user-friendly for the DM and, 
thus, complicated information is not required. It is assumed that the objective 
functions are bounded (from below). 

To start with, a utopian objective vector is established. Then the distance 
from the utopian objective vector to the feasible objective region, measured by 
the weighted Tchebycheff metric, is minimized. Different solutions are obtained 
with well dispersed positive weighting vectors in the metric, as introduced in 
Sect. 4.3. 

At the first iteration, a sample of the whole Pareto optimal set is generated. 
The solution space is reduced by tightening the upper and the lower bounds for 
the weights. Then a concentrated group of weighting vectors centred about the 
selected one is formed. Thus, the idea is to develop a sequence of progressively 
smaller subsets of the Pareto optimal set until a final solution is located. 

Every Pareto optimal solution of can be found by solving the weighted 
Tchebycheff problem with z** but some of the solutions may be weakly Pareto 
optimal. Here this weakness is overcome by formulating the distance minimiza- 
tion problem in a lexicographic form: 

lex minimize maxi=i,,..,fc [wi{f,{x) - z**) ] , I]Li(/i(®) “ 4*) (xx) 
subject to X G S . 
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The solution of (11) is Pareto optimal and any Pareto optimal solution can be 
found. 

The number of the alternative objective vectors to be presented to the DM is 
denoted by P. It may be fixed or different at each iteration. We can now present 
the main steps of the Tchebycheff algorithm. 

1. Specify values for the set size P and the number of iterations H. Construct 
the utopian objective vector. Set h = 1. 

2. Form the current weighting vector space and generate 2P dispersed weighting 
vectors. 

3. Solve (11) for each of the 2P weighting vectors. 

4. Present the P most different of the resulting objective vectors to the DM 
and let her/him choose the most preferred among them. 

5. If /i = H , stop. Otherwise, gather information for reducing the weighting 
vector space, set h = h+1 and go to step (2). 

The predetermined number of iterations is not necessarily conclusive. The 
DM can stop iterating when (s)he obtains a satisfactory solution or continue the 
solution process longer if necessary. 

All the DM has to do in the Tchebycheff method is to compare several Pareto 
optimal objective vectors and select the most preferred one. The ease of the 
comparison depends on the magnitude of P and on the number of objective 
functions. This can be characterized as a non ad hoc method. If the value function 
is known, it is easy to select the alternative maximizing the value function. 

The flexibility of the method is reduced by the fact that the discarded parts of 
the weighting vector space cannot be restored if the DM changes her/his mind. 
Thus, some consistency is required. The weakness of the Tchebycheff method 
is that a great deal of calculation is needed at each iteration and many of the 
results are discarded. For large and complex problems, the Tchebycheff method 
is not a realistic choice. On the other hand, parallel computing can be utilized. 

6.3 Reference Point Method 

As its name suggests, the reference point method (see, e.g., [32]) is based on a 
reference point of aspiration levels. The reference point is used to derive achieve- 
ment scalarizing functions as introduced in Sect. 4.4. No specific assumptions 
are set on the problem to be solved. The reference point idea has been utilized 
in several methods in different ways. Wierzbicki’s reference point method (to be 
discussed here) was among the first of them. 

Before the solution process starts, some information is given to the DM about 
the problem. If possible, the ideal objective vector and the (approximated) nadir 
objective vector are presented to illustrate the ranges of the Pareto optimal 
set. Another possibility is to minimize and maximize the objective functions 
individually in the feasible region (if it is bounded) . An appropriate form for the 
achievement function must also be selected. 




Some Methods for Nonlinear Multi-objective Optimization 



13 



The basic steps of the reference point method are the following: 

1. Present information about the problem to the DM. 

2. Ask the DM to specify a reference point. 

3. Minimize the achievement function. Present the solution to the DM. 

4. Calculate a number of k other (weakly) Pareto optimal solutions by mini- 
mizing the achievement function with perturbed reference points. 

5. Present the alternatives to the DM. If (s)he finds one of the fc -I- 1 solutions 
satisfactory, stop. Otherwise, go to step (2). 

The reference point method can be characterized as an ad hoc method or a 
method having both non ad hoc and ad hoc features. Alternatives are easy to 
compare if the value function is known. On the other hand, a reference point 
cannot be directly defined with the help of the value function. However, it is 
possible to test whether a new reference point has a higher value function value 
than the earlier solutions. 

The reference point method is quite easy for the DM to understand. The DM 
only has to specify appropriate aspiration levels and compare objective vectors. 
What has earlier been said about the comparison of alternatives is also valid 
here. The solutions are weakly or Pareto optimal depending on the achievement 
function employed. 

The freedom of the DM has both positive and negative aspects. On the one 
hand, the DM can direct the solution process and is free to change her/his mind 
during the process. On the other hand, there is no clear strategy to produce the 
final solution since the method does not help the DM to find improved solutions. 
A software family called DIDAS (Dynamic Interactive Decision Analysis and 
Support) has been developed on the basis of the reference point ideas (see [34] 
for details). 

6.4 GUESS Method 

The GUESS method is a simple interactive method related to the reference point 
method. The method is also sometimes called a naive method and it is presented 
in [3]. The ideal objective vector z* and the nadir objective vector are 
required to be available. 

The method proceeds as follows. The DM specifies a reference point (or a 
guess) below the nadir objective vector and the minimum weighted deviation 
from the nadir objective vector is maximized. Then the DM specifies a new 
reference point and the iteration continues until the DM is satisfied with the 
solution produced. 

The problem to be solved is 

minimize 
subject to 

The solution of (12) is weakly Pareto optimal and any Pareto optimal solution 
can be found. The algorithm can be formulated as follows. 



X G S . 



(12) 
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1. Calculate the ideal objective vector and the nadir objective vector and 
present them to the DM. 

2. Let the DM specify upper or lower bounds to the objective functions if (s)he 
so desires. Update (12), if necessary. 

3. Ask the DM to specify a reference point between the ideal and the nadir 
objective vectors. 

4. Solve (12) and present the solution to the DM. 

5. If the DM is satisfied, stop. Otherwise, go to step (2). 

The only stopping rule is the satisfaction of the DM. No guidance is given 
to the DM in setting new aspiration levels. This is typical of many reference 
point-based methods. The GUESS method is an ad hoc method. The existence 
of a value function would not help in determining new reference points or upper 
or lower bounds for the objective functions. 

The weakness of the GUESS method is its heavy reliance on the availability 
of the nadir objective vector. As mentioned earlier, the nadir objective vector is 
not easy to determine and it is usually only an approximation. 

An interesting practical observation is mentioned in [3]. Namely, DMs are 
easily satisfied if there is a small difference between the reference point and the 
solution obtained. Somehow they feel a need to be satisfied when they have 
almost achieved what they wanted. In this case they may stop iterating ‘too 
early.’ The DM is naturally allowed to stop the solution process if the solution 
really is satisfactory. But, the coincidence of setting the reference point near an 
attainable solution may unnecessarily increase the DM’s satisfaction. 



6.5 Satisficing Trade-Off Method 

The satisficing trade-off method (STOM) (see, e.g., [25]) is based on ideas similar 
to the two earlier methods with emphasis on finding a satisficing solution. The 
differentiating factor is the trade-off information utilized. 

The functioning of STOM is the following. After a (weakly) Pareto optimal 
solution has been obtained by optimizing a scalarizing function, it is presented to 
the DM. On the basis of this information (s)he is asked to classify the objective 
functions into three classes. The classes are the unacceptable objective functions 
whose values (s)he wants to improve, the acceptable objective functions whose 
values (s)he agrees to relax (impair) and the acceptable objective functions whose 
values (s)he accepts as they are. 

The objective and the constraint functions are assumed to be twice con- 
tinuously differentiable. Under some additional special assumptions, trade-off 
information can be obtained from the KKT multipliers related to the scalarizing 
function. With this information, appropriate upper bounds can be determined 
for the functions to be relaxed. Thus, the DM only has to specify aspiration 
levels for functions to be improved. This is called automatic trade-off. Next, a 
modified scalarizing function is minimized and the DM is asked to classify the 
objective functions at the new solution. 
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Different scalarizing functions have been suggested for use in STOM. In the 
original formulation, the weighted Tchebycheff metric is used and the weights 
are set as Wi = _ for i = 1, ... ,k, where 5 is a reference point and z** is 
a utopian objective vector so that z > z** . If weakly Pareto optimal solutions 
are to be avoided, the scalarizing function can be augmented as described in 
Sect. 4.3. 

Even if a value function existed, it could not be directly used to determine 
the functions to be decreased and increased or the amounts of change. Thus the 
method is characterized as an ad hoc method. If automatic trade-off is not used, 
the method resembles the GUESS method. 



6.6 Light Beam Search 

Light beam search, described in [13], combines the reference point idea and tools 
of multiattribute decision analysis. The achievement function (8) is used with 
weights only in the maximum part. The reference point is here assumed to be 
an infeasible objective vector. 

It is assumed that the objective and the constraint functions are continuously 
differentiable and the ideal and the nadir objective vectors are available. In 
addition, none of the objective functions is allowed to be more important than 
all the others together. 

In the light beam search, the learning process of the DM is supported by 
providing additional information about the Pareto optimal set at each iteration. 
This means that other solutions in the neighbourhood of the current solution 
(based on the reference point) are displayed. However, an attempt is made to 
avoid frustration on the part of the DM caused, e.g., by indifference between the 
alternatives. 

Concepts used in ELECTRE methods (see, e.g., [26]) are here employed. The 
idea is to establish outranking relations between alternatives. It is said that one 
alternative outranks the other if it is at least as good as the latter. In the light 
beam search, additional alternatives near the current solution are generated so 
that they outrank the current one. Incomparable or indifferent alternatives are 
not shown to the DM. 

To be able to compare alternatives and to define outranking relations, we 
need several thresholds from the DM. The DM is asked to provide indifference 
thresholds for each objective function describing intervals where indifference pre- 
vails. Furthermore, the line between indifference and preference does not have to 
be sharp. The hesitation between indifference and preference can be expressed 
by preference thresholds. One more type of threshold, namely a veto threshold 
can be defined. It prevents a good performance in some objectives from com- 
pensating for poor values on some other objectives. 

Let us now outline the light beam algorithm. 

1. If the DM wants to or can specify the best and the worst values for each 

objective function, save them. Alternatively calculate z* and 2 ;”®''^. Set z* as 
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a reference point. Ask the DM to specify indifference thresholds. If desired, 
(s)he can also specify preference and veto thresholds. 

2. Minimize the achievement function. 

3. Present the solution to the DM. Calculate k Pareto optimal characteristic 
neighbours and present them as well to the DM. If the DM wants to see 
alternatives between any two of the k + 1 alternatives displayed, set their 
difference as a search direction, take different steps in this direction and 
project them onto the Pareto optimal set before showing them to the DM. 
If desired, save the current solution. 

4. If desired, let the DM revise the thresholds and go to step (3). Otherwise, 
if the DM wants to give another reference point, go to step (2). If, on the 
other hand, the DM wants to select one of the alternatives displayed or 
saved as a current solution, go to step (3). Finally, if one of the alternatives 
is satisfactory, stop. 

Characteristic neighbours are new alternative objective vectors that outrank 
the current solution. See [13] for details. The option of saving desirable solutions 
increases the flexibility of the method. The DM can explore different directions 
and select the best among different trials. 

The light beam search can be characterized as an ad hoc method. If a value 
function were available, it could not directly determine new reference points. It 
could, however, be used in comparing the set of alternatives. Yet, the thresholds 
are important in the method and they must come from the DM. Specifying 
different thresholds is a new aspect when compared to the methods presented 
earlier. This may be demanding for the DM. Anyway, it is positive that the 
thresholds are not assumed to be global but can be altered at any time. 

The light beam search is a rather versatile solution method where the DM 
can specify reference points, compare a set of alternatives and affect the set of 
alternatives in different ways. Thresholds are used to try to make sure that the 
alternatives generated are not worse than the current solution. In addition, the 
alternatives must be different enough to be compared and comparable on the 
whole. This should decrease the burden on the DM. 

6.7 NIMBUS Method 

NIMBUS (Nondifferentiable Interactive Multiobjective BUndle-based optimiza- 
tion System), presented in [20,21,22], is an interactive multiobjective optimiza- 
tion method designed especially to be able to handle nondifferentiable functions 
efficiently. For this reason, it is capable of solving complicated real-world prob- 
lems. It is assumed that the objective and the constraint functions are locally 
Lipschitzian (if a nondifferentiable solver is used) and the ideal objective vector 
is available. 

NIMBUS is based on the classification of the objective functions where the 
DM can easily indicate what kind of improvements are desirable and what kind 
of impairments are tolerable. The DM examines at iteration h the values of 
the objective functions calculated at the current solution and divides the 
objective functions into up to five classes; functions fi whose values 
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— should be decreased (i € 

— should be decreased to a certain aspiration level zi < {i e /-), 

— are satisfactory at the moment (i G I^), 

— are allowed to increase to a certain upper bound e* > {i G /^) and 

— are allowed to change freely {i G P), 

where U /- yf 0 and U U J* yf 0. 

The difference between the classes and I- is that the functions in P are 
to be minimized as far as possible but the functions in I- only as far as the 
aspiration level. The classification is the core of NIMBUS. However, the DM can 
specify optional positive weights Wi summing up to one. 

After the DM has classified the objective functions, a subproblem 



minimize 
subject to 



max,£7< \wi{fi{x) - z*),WjT[ia,x[fj{x) - Zj, 0] 

ieii L 

fi{x) < fi{x^) for all i G P U I- U P , 
fi{x) < Ei for all i G P , 

X G S 



(13) 



is formed, where zf (i G /^) are components of the ideal objective vector. 
The solution of (13) is weakly Pareto optimal if the set is nonempty. On 
the other hand, any Pareto optimal solution can be found with an appropriate 
classification. 

If the DM does not like the solution of (13) for some reason, (s)he can explore 
other solutions between the old and this new one. Then we calculate a search 
direction as a difference of these two solutions and provide more solutions by 
taking steps of different sizes in this direction. 

The NIMBUS algorithm is given below. Note that the DM must be ready 
to give up something in order to attain improvement for some other objective 
functions. 



1. Ask the DM to classify the objective functions at the current point. 

2. Solve the subproblem and present the solution to the DM. If (s)he wants to 
see different alternatives between the old and the new solution, go to step 
(3). If the DM prefers either of the two solutions and want to continue from 
it, go to step (1). Otherwise, go to step (4). 

3. Ask the DM to specify the desired number of alternatives P and calculate 
alternative vectors. Present their Pareto optimal counterparts to the DM and 
let her/him choose the most preferred one among them. If the DM wants to 
continue, go to step (1). 

4. Check Pareto optimality and stop. 

In NIMBUS, the DM expresses iteratively her/his desires. Unlike some other 
methods based on classification, the success of the solution process does not 
depend entirely on how well the DM manages in specifying the classification 
and the appropriate parameter values. It is important that the classification 
is not irreversible. Thus, no irrevocable damage is caused in NIMBUS if the 
solution obtained is not what was expected. The DM is free to go back or explore 
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intermediate points. (S)he can easily get to know the problem and its possibilities 
by specifying, e.g., loose upper bounds and examining intermediate solutions. 

In NIMBUS, the DM can explore the (weakly) Pareto optimal set and change 
her/his mind if necessary. The DM can also extract undesirable solutions from 
further consideration. 

The method is ad hoc in nature, since the existence of a value function would 
not directly advise the DM how to act to attain her/his desires. A value function 
could only be used to compare different alternatives. 

The method has been implemented as a WWW-NIMBUS system on the 
Internet (see [23]). Via the Internet we can centralize the computing to one 
server computer (at the University of Jyvaskyla) and the WWW is a way of 
distributing the graphical user interface to the computers of each individual 
user. Besides, the user always has the latest version of NIMBUS available. 

The most important aspect of WWW-NIMBUS is that it is easily accessible 
and available to any Internet user (http://nimbus.mit.jyu.fi/). No special tools, 
compilers or software besides a WWW browser are needed. The user saves the 
trouble of installing any software and the system is independent of the computer 
and the operating system used. 

The system contains both a nondifferentiable local solver (proximal bundle 
method) (see [19], pp. 112-143) and a global solver (genetic algorithms) for the 
subproblem. When the first version of WWW-NIMBUS was implemented in 
1995 it was a pioneering interactive optimization system on the Internet. 

7 Conclusions 

As has been stressed, a large variety of methods exists for multiobjective op- 
timization problems and none of them can be claimed to be superior to the 
others in every aspect. When selecting a solution method, the specific features 
of the problem to be solved must be taken into consideration. In addition, the 
opinions of the DM are important. One can say that selecting a multiobjective 
optimization method is a problem with multiple objectives itself. 

The theoretical properties of the methods can rather easily be compared. A 
comparative table summarizing some of the features of interactive methods is 
given in [20]. However, in addition to theoretical properties, practical applica- 
bility also plays an important role in the selection of an appropriate method for 
the problem to be solved. The difficulty is that practical applicability is hard to 
determine without experience and experimentation. 

The features of the problem and the capabilities of the DM have to be charted 
before a solution method can be chosen. Some methods may suit some problems 
and some DMs better than others. A decision tree is provided in [20] for easing 
the selection. 

As far as the future is concerned, the obvious conclusion in the development 
of methods is the importance of continuing in the direction of user-friendliness. 
Methods must be even better able to correspond to the characteristics of the DM. 
If the aspirations of the DM change during the solution process, the algorithm 
must be able to cope with this situation. 

Computational tests have confirmed the idea that DMs want to feel in con- 
trol of the solution process, and consequently they must understand what is 
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happening. However, sometimes the DM simply needs support, and this should 
be available as well. Thus, the aim is to have methods that support learning so 
that guidance is given whenever necessary. The DM can be supported by using 
visual illustrations and further development of such tools is essential. In addition 
to bar charts, value paths and petal diagrams of alternatives, we may use 3D 
slices of the feasible objective region (see [18]) and other tools. Specific methods 
for different areas of application that take into account the characteristics of the 
problems are also important. 
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Abstract. This tutorial will review some of the basic concepts related to 
evolutionary multiobjective optimization (i.e., the use of evolutionary al- 
gorithms to handle more than one objective function at a time). The most 
commonly used evolutionary multiobjective optimization techniques will 
be described and criticized, including some of their applications. Theory, 
test functions and metrics will be also discussed. Finally, we will provide 
some possible paths of future research in this area. 



1 Introduction 

Most real-world engineering optimization problems are multiobjective in nature, 
since they normally have several (possibly conflicting) objectives that must be 
satisfied at the same time. The notion of “optimum” has to be re-deflned in this 
context and instead of aiming to And a single solution, we will try to produce 
a set of good compromises or “trade-offs” from which the decision maker will 
select one. 

Over the years, the work of a considerable amount of operational researchers 
has produced an important number of techniques to deal with multiobjective 
optimization problems m- However, it was until relatively recently that re- 
searchers realized of the potential of evolutionary algorithms in this area. 

The potential of evolutionary algorithms in multiobjective optimization was 
hinted by Rosenberg in the 1960s |S2|, but this research area, later called Evolu- 
tionary Multi-Objective Optimization (EMOO for short) remained unexplored 
for almost twenty five years. However, researchers from many different disciplines 
have shown an increasing interest in EMOO in recent years. The considerable 
amount of research related to EMOO currently reported in the literature (over 
630 public at ion is a clear reflection of such interest. 

* This work was done while the author was at the Laboratorio Nacional de Informatica 
Avanzada, Rebsamen 80, Xalapa, Veracruz 91090, Mexico. 

^ The author maintains a repository on Evolutionary Multiobjective Optimization at: 
http : //www. lania.mx/'ccoello/EMOO/ with a mirror at http : / /www . jeo . org/ emo/ 

E. Zitzler et al. (Eds.): EMO 2001, LNCS 1993, pp. 21-EJJ 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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This paper will provide a short tutorial on EMOO, including a review of 
the main existing approaches (a description of the technique, together with its 
advantages and disadvantages and some of its applications) and of the most 
significant research done in theory, test functions and metrics. We will finish 
with a short review of two promising areas of future research. 

2 Basic Definitions 

Multiobjective optimization (also called multicriteria optimization, multiperfor- 
mance or vector optimization) can be defined as the problem of finding H|: 

a vector of decision variables which satisfies constraints and optimizes a 
vector function whose elements represent the objective functions. These 
functions form a mathematical description of performance criteria which 
are usually in conflict with each other. Hence, the term “optimize” means 
finding such a solution which would give the values of all the objective 
functions acceptable to the designer. 

Formally, we can state it as follows: 

Find the vector x* = [xl,X 2 ,--- which will satisfy the m inequality 

constraints: 



9i{x) >0 Z = 1, 2, . . . , TO 


(1) 


the p equality constraints 




hi{x) = 0 z = 1,2,... ,p 


(2) 


and optimizes the vector function 




f{x) = [fi{x)j2{x),... ,fk{x)f 


(3) 



where x = [xi,X 2 , ■ ■ ■ , Xn]^ is the vector of decision variables. 

In other words, we wish to determine from among the set T of all num- 
bers which satisfy dU and 0 the particular set x\,X 2 t ■ ■ ,x*j. which yields the 
optimum values of all the objective functions. 

It is rarely the case that there is a single point that simultaneously optimizes 
all the objective functions. Therefore, we normally look for “trade-offs”, rather 
than single solutions when dealing with multiobjective optimization problems. 
The notion of “optimum” is therefore, different. The most commonly adopted 
notion of optimality is that originally proposed by Francis Ysidro Edgeworth 
and later generalized by Vilfredo Pareto |SD|. Although some authors call 
Edgeworth- Pareto optimum to this notion (see for example Stadler |ti I jl. we will 
use the most commonly accepted term: Pareto optimum. 

We say that a vector of decision variables a;* S .F is Pareto optimal if there 
does not exist another x G !F such that fi{x) < fi{x*) for all i = 1, . . . , fc and 
fj{x) < fj{x*) for at least one j. 
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In words, this definition says that x* is Pareto optimal if there exists no 
feasible vector of decision variables x £ J- which would decrease some criterion 
without causing a simultaneous increase in at least one other criterion. Unfor- 
tunately, this concept almost always gives not a single solution, but rather a set 
of solutions called the Pareto optimal set. The vectors x* correspoding to the 
solutions included in the Pareto optimal set are called nondominated. The plot 
of the objective functions whose nondominated vectors are in the Pareto optimal 
set is called the Pareto front. 

2.1 An Example 

Let us analyze a simple example of a multiobjective optimization problem, that 
has been studied by Stabler & Dauer m We want to design the four-bar plane 
truss shown in Figure Q We will consider two objective functions: minimize 
the volume of the truss (/i) and minimize its joint displacement A (/ 2 ). The 
mathematical definition of the problem is: 




1 



L 



A 



2F 



L 



L 



Fig. 1. A four-bar plane truss. 



Minimize 




(4) 



such that: 



(F/ct) < xi < 3{F/a) 
V2{F/a) <X2< 3{F/a) 
V2{F/a) <X3< 3{F/a) 
{F/a) <Xi< 3{F/a) 



( 5 ) 



where F = 10 kN, F = 2 x 10® kN/cm^, L = 200 cm, tr = 10 kN/cm^. 
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The global Pareto front of this problem can be obtained by enumeration. The 
process consists of iterating on the four decision variables (with a reasonable 
granularity) to get a set of points representing the search space. Then, we apply 
the concept of Pareto optimality previously defined to the points generated. The 
result of this procedure, plotted on objective function space is shown in Figure El 
This is the true (or global) Pareto front of the problem. 




Fig. 2. True Pareto front of the four-bar plane truss problem. 



3 Why Evolutionary Algorithms? 

The first implementation of an EMOO approach was Schaffer’s Vector Evaluation 
Genetic Algorithm (VEGA), which was introduced in the mid-1980s, mainly 
intended for solving problems in machine learning 

Schaffer’s work was presented at the First International Conference on 
Genetic Algorithms in 1985 m- Interestingly, his simple unconstrained two- 
objective functions became the usual test suite to validate most of the evolu- 
tionary multiobjective optimization techniques developed during the following 
years innissi. 

Evolutionary algorithms seem particularly suitable to solve multiobjective 
optimization problems, because they deal simultaneously with a set of possible 
solutions (the so-called population). This allows us to find several members of 
the Pareto optimal set in a single run of the algorithm, instead of having to 
perform a series of separate runs as in the case of the traditional mathematical 
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programming techniques jH|- Additionally, evolutionary algorithms are less sus- 
ceptible to the shape or continuity of the Pareto front (e.g., they can easily deal 
with discontinuous or concave Pareto fronts), whereas these two issues are a real 
concern for mathematical programming techniques. 



4 Reviewing EMOO Approaches 



^mm 



There are several detailed surveys of EMOO reported in the literature 
and this tutorial does not intend to produce a new one. Therefore, we will limit 
ourselves to a short discussion on the most popular EMOO techniques currently 
in use, including two recent approaches that look very promising. 



4.1 Aggregating Functions 

A genetic algorithm relies on a scalar fitness function to guide the search. There- 
fore, the most intuitive approach to deal with multiple objectives would be to 
combine them into a single function. The approach of combining objectives into 
a single (scalar) function is normally denominated aggregating functions, and it 
has been attempted several times in the literature with relative success in prob- 
lems in which the behavior of the objective functions is more or less well-known. 
An example of this approach is a sum of weights of the form: 



k 

min E Wifi{x) (6) 

i=l 

where Wi > 0 are the weighting coefficients representing the relative impor- 
tance of the k objective functions of our problem. It is usually assumed that 

k 

= 1 (7) 

Since the results of solving an optimization model using can vary signifi- 
cantly as the weighting coefficients change, and since very little is usually known 
about how to choose these coefficients, a necessary approach is to solve the same 
problem for many different values of Wi. 

Advantages and Disadvantages. This approach does not require any changes 
to the basic mechanism of a genetic algorithm and it is therefore very simple, 
easy to implement and efficient. The approach can work properly in simple multi- 
objective optimization problems with few objective functions and convex search 
spaces. One obvious problem of this approach is that it may be difficult to gen- 
erate a set of weights that properly scales the objectives when little is known 
about the problem. However, its most serious drawback is that it cannot gener- 
ate proper members of the Pareto optimal set when the Pareto front is concave 
regardless of the weights used m- 
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Sample Applications 

— Truck packing problems |HD|. 

— Real-time scheduling E3- 

— Structural synthesis of cell-based VLSI circuits P . 
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Fig. 3. Schematic of VEGA selection. 



4.2 VEGA 

Schaffer |5iS] proposed an approach that he called the Vector Evaluated Genetic 
Algorithm (VEGA), and that differed of the simple genetic algorithm (GA) only 
in the way in which selection was performed. This operator was modified so that 
at each generation a number of sub-populations was generated by performing 
proportional selection according to each objective function in turn. Thus, for 
a problem with k objectives and a population size of M, k sub-populations of 
size M/k each would be generated. These sub-populations would be shuffled 
together to obtain a new population of size M, on which the GA would apply 
the crossover and mutation operators in the usual way. This process is illustrated 
in Figure 0 

The solutions generated by VEGA are locally nondominated, but not neces- 
sarily globally nondominated. VEGA presents the so-called “speciation” problem 
(i.e., we could have the evolution of “species” within the population which ex- 
cel on different objectives). This problem arises because this technique selects 
individuals who excel in one objective, without looking at the others. The poten- 
tial danger doing that is that we could have individuals with what Schaffer m 
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called “middling” performance in all dimensions, which could be very useful 
for compromise solutions, but that will not survive under this selection scheme, 
since they are not in the extreme for any dimension of performance (i.e., they 
do not produce the best value for any objective function, but only moderately 
good values for all of them). Speciation is undesirable because it is opposed to 
our goal of finding compromise solutions. 



Advantages and Disadvantages. Since only the selection mechanism of the 
GA needs to be modified, the approach is easy to implement and it is quite effi- 
cient. However, the “middling” problem prevents the technique from finding the 
compromise solutions that we normally aim to produce. In fact, if proportional 
selection is used with VEGA (as Schaffer did), the shuffling and merging of all 
the sub-populations corresponds to averaging the fitness components associated 
with each of the objectives m In other words, under these conditions, VEGA 
behaves as an aggregating approach and therefore, it is subject to the same 
problems of such techniques. 



Sample Applications 

— Optimal location of a network of groundwater monitoring wells 0. 

— Gombinational circuit design |B|. 

— Design multiplierless HR filters ED- 

4.3 MOGA 

Fonseca and Fleming |251 proposed the Multi- Objective Genetic Algorithm 
(MOGA). The approach consists of a scheme in which the rank of a certain 
individual corresponds to the number of individuals in the current population 
by which it is dominated. All nondominated individuals are assigned rank 1, 
while dominated ones are penalized according to the population density of the 
corresponding region of the trade-off surface. 

Fitness assignment is performed in the following way |2S): 

1. Sort population according to rank. 

2. Assign fitness to individuals by interpolating from the best (rank 1) to the 
worst (rank n < M) in the way proposed by Goldberg |2H] (the so-called 
Pareto ranking assignment process), according to some function, usually lin- 
ear, but not necessarily. 

3. Average the fitnesses of individuals with the same rank, so that all of them 
will be sampled at the same rate. This procedure keeps the global population 
fitness constant while maintaining appropriate selective pressure, as defined 
by the function used. 

^ By “middling” , Schaffer meant an individual with acceptable performance, perhaps 
above average, but not outstanding for any of the objective functions. 
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Since the use of a blocked fitness assignment scheme as the one indicated 
before is likely to produce a large selection pressure that might produce pre- 
mature convergence m, the authors proposed the use of a niche-formation 
method to distribute the population over the Pareto-optimal region m- Sharing 
is performed on the objective function values, and the authors provided some 
guidelines to compute the corresponding niche sizes. MOGA also uses mating 
restrictions. 



Advantages and Disadvantages. The main strengths of MOGA is that is 
efficient and relatively easy to implement El . Its main weakness is that, as with 
all the other Pareto ranking technique^ its performance is highly dependent on 
an appropriate selection of the sharing factor. 

MOGA has been a very popular EMOO technique (particularly within the 
control community), and it normally exhibits a very good overall performance 

im. 

Some Applications 

— Fault diagnosis HS|. 

— Gontrol system design IHIfifilill . 

— Wing planform design EHI 

— Design of multilayer microwave absorbers m 

4.4 NSGA 

The N ondominated Sorting Genetic Algorithm (NSGA) was proposed by Srini- 
vas and Deb EDI, and is based on several layers of classifications of the individu- 
als. Before selection is performed (stochastic remainder proportionate selection 
was used), the population is ranked on the basis of domination (using Pareto 
ranking): all nondominated individuals are classified into one category (with a 
dummy fitness value, which is proportional to the population size). To maintain 
the diversity of the population, these classified individuals are shared (in decision 
variable space) with their dummy fitness values. Then this group of classified 
individuals is removed from the population and another layer of nondominated 
individuals is considered (i.e., the remainder of the population is re-classified). 
The process continues until all individuals in the population are classified. Since 
individuals in the first front have the maximum fitness value, they always get 
more copies than the rest of the population. This allows us to search for non- 
dominated regions, and results in convergence of the population toward such 
regions. Sharing, on its part, helps to distribute the population over this region. 
Figure El (taken from Srinivas and Deb jnOj) shows the general fiow chart of this 
approach. 

^ The use of a ranking scheme based on the concept of Pareto optimality was originally 
proposed by Goldberg m- 
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Fig. 4. Flowchart of the Nondominated Sorting Genetic Algorithm (NSGA). 



Advantages and Disadvantages. Some researchers have reported that NSGA 
has a lower overall performance than MOGA (both computationally and in terms 
of quality of the Pareto fronts produced), and it seems to be also more sensitive 
to the value of the sharing factor than MOGA El However, Deb et al. II 811 91 
have recently proposed a new version of this algorithm, called NSGA-II, which is 
more efficient (computationally speaking), uses elitism and a crowded compari- 
son operator that keeps diversity without specifying any additional parameters. 
The new approach has not been extensively tested yet, but it certainly looks 
promising. 
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Sample Applications 

— Airfoil shape optimization m- 

— Scheduling 0. 

— Minimum spanning tree IS!. 

4.5 NPGA 

Horn et al. m proposed the Niched Pareto Genetic Algorithm, which uses a 
tournament selection scheme based on Pareto dominance. Instead of limiting 
the comparison to two individuals (as normally done with traditional GAs), a 
higher number of individuals is involved in the competition (typically around 
10% of the population size). When both competitors are either dominated or 
nondominated (i.e., when there is a tie), the result of the tournament is decided 
through fitness sharing in the objective domain (a technique called equivalent 
class sharing was used in this case) |38] . 

The pseudocode for Pareto domination tournaments assuming that all of the 
objectives are to be maximized is presented below |2Z|. 5 is an array of the N 
individuals in the current population, random-pop -index is an array holding the 
N indices of S, in a random order, and tdom is the size of the comparison set. 

function selection /* Returns an individual from the current population S */ 

begin 

shuffle(random_pop -index); /* Re-randomize random index array */ 
candidate_l = random_pop_index[l]; 
candidate_2 = random_popJndex[2]; 
candidate-1 -dominated = false; 
candidate_2_dominated = false; 
for comparison_set Jndex = 3 to tdom + 3 do 
/* Select tdom individuals randomly from S */ 
begin 

comparisonJndividual = random_popJndex[comparison_set Jndex]; 
if S'[comparisonJndividual] dominates ^[candidate-l] 
then candidate_l_dominated = true; 
if S'[comparisonJndividual] dominates S'[candidate_2] 
then candidate_2_dominated = true; 
end /* end for loop */ 

if ( candidate_l_dominated AND ^ candidate_2_dominated ) 
then return candidate_2; 

else if ( ^ candidate_l_dominated AND candidate_2_dominated ) 
then return candidate_l; 

else 

do sharing; 



end 
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This technique normally requires population sizes considerably larger than 
usual with other approaches, so that the noise of the selection method can be 
tolerated by the emerging niches in the population m- 



Advantages and Disadvantages. Since this approach does not apply Pareto 
ranking to the entire population, but only to a segment of it at each run, its 
main strength are that it is faster than MOGA and NSGA0. Furthermore, it 
also produces good nondominated fronts that can be kept for a large number 
of generations El. However, its main weakness is that besides requiring a shar- 
ing factor, this approach also requires an additional parameter: the size of the 
tournament. 



Sample Applications 

— Automatic derivation of qualitative descriptions of complex objects |SS|. 

— Feature selection 

— Optimal well placement for groundwater containment monitoring 

— Investigation of feasibility of full stern submarines |E1 • 



4.6 Target Vector Approaches 

Under this name we will consider approaches in which the decision maker has 
to assign targets or goals that wishes to achieve for each objective. The GA in 
this case, tries to minimize the difference between the current solution found and 
the vector of goals (different metrics can be used for that purpose). The most 
popular techniques included here are hybrids with: Goal Programming pTznj . 
Goal Attainment and the min-max approach EEI- 



Advantages and Disadvantages. The main strength of these methods is 
their efficiency (computationally speaking) because they do not require a Pareto 
ranking procedure. However, their main weakness is the definition of the desired 
goals which requires some extra computational effort (normally, these goals are 
the optimum of each objective function, considered separately). Furthermore, 
these techniques will yield a nondominated solution only if the goals are chosen 
in the feasible domain, and such condition may certainly limit their applicability. 



Some Applications 

— Truss design IWl . 

— Design of a robot arm cni. 

— Synthesis of low-power operational amplifiers m- 

Pareto ranking is O(kM^), where k is the number of objectives and M is the popu- 
lation size. 
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4.7 Recent Approaches 

Recently, several new EMOO approaches have been developed. We consider im- 
portant to discuss briefly at least two of them: PAES and SPEA. 

The Pareto Archived Evolution Strategy (PAES) was introduced by Knowles 
& Come m- This approach is very simple: it uses a (1+1) evolution strategy 
(i.e., a single parent that generates a single offspring) together with a histori- 
cal archive that records all the nondominated solutions previously found (such 
archive is used as a comparison set in a way analogous to the tournament com- 
petitors in the NPGA) . PAES also uses a novel approach to keep diversity, which 
consists of a crowding procedure that divides objective space in a recursive man- 
ner. Each solution is placed in a certain grid location based on the values of its 
objectives. A map of such grid is maintained, indicating the amount of solu- 
tions that reside in each grid location. Since the procedure is adaptive, no extra 
parameters are required (except for the number of divisions of the objective 
space). Furthermore, the procedure has a lower computational complexity than 
traditional niching methods. PAES has been used to solve the off-line routing 
problem m and the adaptive distributed database management problem m- 

The Strength Pareto Evolutionary Algorithm (SPEA) was introduced by Zit- 
zler & Thiele ^H1 • This approach was conceived as a way of integrating different 
EMOO techniques. SPEA uses an archive containing nondominated solutions 
previously found (the so-called external nondominated set). At each generation, 
nondominated individuals are copied to the external nondominated set. For each 
individual in this external set, a strength value is computed. This strength is 
similar to the ranking value of MOGA, since it is proportional to the number of 
solutions to which a certain individual dominates. The fitness of each member 
of the current population is computed according to the strengths of all external 
nondominated solutions that dominate it. Additionally, a clustering technique 
is used to keep diversity. SPEA has been used to explore trade-offs of software 
implementations for DSP algorithms m and to solve 0/1 knapsack problems 

m- 



5 Theory 



The most important theoretical work related to EMOO has concentrated on two 
main issues: 






— Studies of convergence towards the Pareto optimum set 

— Ways to compute appropriate sharing factors (or niche sizes) |3(if,35|25j . 



Obviously, a lot of work remains to be done. It would be very interesting 
to study, for example, the structure of fitness landscapes in multiobjective op- 
timization problems HOE]. Such study could provide some insights regarding 
the sort of problems that are particularly difficult for an evolutionary algorithm 
and could also provide clues regarding the design of more powerful EMOO tech- 
niques. 
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Also, there is a need for detailed studies of the different aspects involved in 
the parallelization of EMOO techniques (e.g., load balancing, impact on Pareto 
convergence, performance issues, etc.), including new algorithms that are more 
suitable for parallelization than those currently in use. 



6 Test Functions 

The design of test functions that are appropriate to evaluate EMOO approaches 
was disregarded in most of the early research in this area. However, in recent 
years, there have been several interesting proposals. Deb fTTIThj proposed ways 
to create controllable test problems for evolutionary multiobjective optimization 
techniques using single-objective optimization problems as a basis. He proposed 
to transform deceptive and massively multimodal problems into very difficult 
multiobjective optimization problems. More recently, his proposal was extended 
to constrained multiobjective optimization problems ca ( in most of the early 
papers on EMOO techniques, only unconstrained test functions were used). 

Van Veldhuizen and Lamont have also proposed some guidelines to 

design a test function suite for evolutionary multiobjective optimization tech- 
niques, and have included in a technical report some sample test problems 
(mainly combinatorial optimization problems) [titij . In this regard, the litera- 
ture on multiobjective combinatorial optimization can be quite useful [2,3) . The 
benchmarks available for problems like the multiobjective 0/1 knapsack can 
be used to validate EMOO approaches. Such idea has been explored by a few 
EMOO researchers (for example |YM3q[ b but more work in this direction is still 
necessary. 

7 Metrics 

Assuming that we have a set of test functions available, the next issue is how 
to compare different EMOO techniques. The design of metrics has been studied 
recently in the literature. The main proposals so far are the following: 

— Van Veldhuizen and Lamont m proposed the so-called generational dis- 
tance, which is a measure of how close our current Pareto front is from the 
true Pareto front (assuming we know where it lies). 

— Srinivas and Deb [SD| proposed the use of an statistical measure (the chi- 
square distribution) to estimate the spread of the population on the Pareto 
front with respect to the sharing factor used. 

— Zitzler and Thiele m proposed two measures: the first concerns the size of 
the objective value space which is covered by a set of nondominated solutions 
and the second compares directly two sets of nondominated solutions, using 
as a metric the fraction of the Pareto front covered by each of them. Several 
other similar metrics have been also suggested recently by Zitzler et al. ESI 
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— Fonseca and Fleming m proposed the definition of certain (arbitrary) goals 
that we wish the GA to attain; then we can perform multiple runs and apply 
standard non-parametric statistical procedures to evaluate the quality of the 
solutions (i.e. the Pareto fronts) produced by the EMOO technique under 
study, and/or compare it against other similar techniques. 

There are few comparative studies of EMOO techniques where these metrics 
have been used and more comprehensive comparisons are still lacking in the 
literature rr.Midl: II . Also, it is important to consider that most of the previously 
mentioned metrics assume that the user can generate the global Pareto front of 
the problem under study (using, for example, an enumerative approach), and 
that will not be possible in most real-world applications. 

8 Promising Areas of Future Research 

There are at least two areas of future research that deserve more attention in 
the next few years: 

— Incorporation of preferences: We should not ignore the fact that the so- 
lution of a multiobjective optimization problem really involves three stages: 
measurement, search, and decision making. Most EMOO research tends to 
concentrate on issues related to the search of nondominated vectors. How- 
ever, these nondominated vectors do not provide any insight into the process 
of decision making itself (the decision maker still has to choose manually one 
of the several alternatives produced), since they are really a useful generaliza- 
tion of a utility function under the conditions of minimum information (i.e., 
all attributes are considered as having equal importance; in other words, the 
decision maker does not express any preferences of the attributes). Thus, the 
issue is how to incorporate the decision maker’s preferences into an EMOO 
approach as to guide the search only to the regions of main interest. There 
are a few recent proposals in this area [I '.^ 10 ] . but more research is still needed. 
Issues such as scalability of the preferences’ handling mechanism and capa- 
bility of the approach to incorporate preferences from several decision makers 
deserve special attention. 

— Emphasis on efficiency: Efficiency has been emphasized in EMOO re- 
search until recently, mainly regarding the number of comparisons performed 
for ranking the population HSI, ways to maintain diversity @21, and proce- 
dures to reduce the computational cost involved in evaluating several (ex- 
pensive) objective functions j‘2l j . However, more work is still needed. For 
example, EMOO researchers have paid little attention to the use of efficient 
data structures. In contrast, operational researchers have used, for example, 
domination-free quad trees where a nondominated vector can be retrieved 
from the tree very efficiently. Checking if a new vector is dominated by the 
vectors in one of these trees can also be done very efficiently m It is there- 
fore necessary to pay more attention to efficiency issues in the design of new 
EMOO approaches, to make them more suitable for real-world applications. 
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9 Conclusions 

This paper has attempted to provide a short tutorial of evolutionary multiob- 
jective optimization. Our discussion has covered the main EMOO approaches 
currently in use, their advantages and disadvantages, and some of their applica- 
tions reported in the literature. 

We have also discussed briefly the theoretical work done in this area, as well 
as some of the research that has attempted to produce benchmarks that are 
appropriate to validate EMOO approaches. We also discussed another problem 
related to this last issue: the definition of appropriate metrics that allow us to 
compare several EMOO techniques. Such metrics should evaluate the capability 
of an EMOO approach to produce a sufficient amount of elements of the Pareto 
optimal set of the problem as well as to spread them appropriately. 

Our discussion finishes with a short description of two possible areas of future 
research in EMOO: mechanisms that facilitate the incorporation of user’s pref- 
erences and the search for efficient procedures and algorithms for evolutionary 
multiobjective optimization and to keep diversity. 
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Abstract. One of the keys to getting one’s arms around multiple ob- 
jective programming is to understand its geometry. With this in mind, 
the purpose of this paper is to function as a short tutorial on multiple 
objective programming that is accomplished maximally with graphs, and 
minimally with text. 



1 Introduction 

Consider the multiple objective program 

max {fi{x) = zi} (1) 

max{fk{x) = Zk} 
s.t. x G S 

where k is the number of objectives, the Zi are criterion values, and S is the fea- 
sible region in decision space. Let Z C denote the feasible region in criterion 
space where z G Z iE there exists an a; G S' such that z = (fi{x), . . . , fk{x)). 
Let K = fc}. Criterion vector z G Z is nondominated iff there does 

not exist another z G Z such that Zi > Zi for all i G A and Zi > Zi for at 
least one i G K. The set of all nondominated criterion vectors is designated N 
and is called the nondominated set. A point a; G S is efficient iff its criterion 
vector z = (fi{x), . . . , fk{x)) is nondominated. The set of all efficient points is 
designated E and is called the efficient set. 

Letting [/ : i?* — > i? be the utility function of the decision maker (DM), any 
z^ G Z that maximizes U over Z is an optimal criterion vector and any x^ G S 
such that (/i(x°), . . . , /fe(a;°)) = is an optimal solution. We are interested 
in the efficient and nondominated sets because if U is coordinatewise increasing 
(i.e., more is always better that less of each fi(x)), z^ G N and any inverse image 
x^ G E. Thus, instead of searching all of Z, we need only find the best criterion 
vector in N to locate an optimal solution to a multiple objective problem. To 
overview the nature of multiple objective programming problems and methods 
for solving them, we have the following tutorial topics: 
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1. Decision Space vs. Criterion Space 

2. Ideal Way to Solve a Multiple Objective Program? 

3. Graphically Detecting Nondominated Criterion Vectors 

4. Reference Criterion Vectors 

5. Size of the Nondominated Set 

6. Weighted Tchebycheff Metrics 

7. Points on Smallest Intersecting Contours 

8. Lexicographic Weighted Tchebycheff Sampling Program 

9. T- Vertex A- Vectors 

10. Wierzbicki’s Aspiration Criterion Vector Method 

11. Tchebycheff Method 

12. Why Not Weighted-Sums Approach? 

13. Other Interactive Procedures 




Fig. 1. S' in Decision Space Fig. 2. Z in Criterion Space 



2 Tutorial Topics 

1. Decision Space vs. Criterion Space (Figs. QH2)- Whereas single 
objective programming is typically studied in decision space, multiple objective 
programming is mostly studied in criterion space. To illustrate, consider the 
two-objective multiple objective linear program (MOLP) of Fig. [D in which 
= (3,1,— 2) and (? = (—1,2,0) are the gradients of the objective functions, 
and S, the feasible region in decision space, is the unit cube in B? . Fig. El shows 
the feasible region Z in criterion space in which the eight are the images of 
the eight extreme points of S. Note that (a) not all extreme points of S map 
into extreme points of Z, (b) Z is at most of dimensionality k, and (c) Z is not 
necessarily confined to the nonnegative orthant. 
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2. Ideal Way to Solve a Multiple Objective Program? (Fig. |^. While 
one might consider maximizing U over Z as in Fig. 0 to be an ideal way to 
solve a multiple objective program, this approach does not work in practice 
because (a) of the impossibility of obtaining U, (b) the approach does not give 
the user a feel for the nondominated set N , and (c) the approach does not 
allow for “learning” during the course of the solution process. Therefore, without 
explicit knowledge of U , the emphasis is on methods that only rely on implicit 
information (e.g., extracting from the DM answers to questions such as which 
in a group of solutions is the most preferred, or which objectives can be relaxed 
to enable more achievement in others). 




Fig. 3. Maximizing U over Z 



3. Graphically Detecting Nondominated Criterion Vectors (Figs, 

ED- Let i?+ be the nonnegative orthant in To determine graphically whether 
a, z G Z is dominated or nondominated, translate R~^ to z. This forms the set 
called the translated nonnegative orthant at z, designated {z} 0 where 0 
denotes set addition. 

Theorem 1. Criterion vector z G Z is nondominated iff {{z}® R'^) nZ = {z}. 

In other words, a z G Z is nondominated if and only if, aside from z, no other 
points reside in the intersection between the nonnegative orthant translated to 
z and Z except for z itself. In the integer multiple objective program of Fig. 0 in 
which Z = {z‘‘ \1 <i <%}, N = For instance, z^ is nondominated 

because if there were a z'" G Z dominating z^, it would have to be in the trans- 
lated nonnegative orthant. On the other hand, z® is dominated because z® is in 
the intersection. In the MOLP of Fig. 0 N = bls[z^ ^ z^], where the notation 
designates the boundary line segment in the clockwise direction from z^ to z^. 
For instance, z'^ is nondominated because the intersection is empty other than 
for z^. On the other hand, z^ is dominated. With regard to the extreme points 
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of Z, of the five, two are nondominated. In the multiple objective program of 
Fig. El N = {z^} U bls( 2 :^ ^ z^], where the left parenthesis in the “bis” signifies 
that z^ is an open endpoint. This occurs because z'^ is dominated by z^ (i.e., 



2^ G 



({z'^} 0 i?"*')). In Fig. [T] iV = bls[z^ ^ U bls[z^ — > z'^] U bls(z^ 




Fig. 4. Nondominance in an Integer 
Case 




Fig. 5. Nondominance in an MOLP 





Fig. 6. Nondominance in a Nonlinear Fig. 7. Nondominance in Another Non- 
Case linear Case 



4. Reference Criterion Vectors (Fig.ISJ. Let K = 1, . . . ,k and let G 

be a reference criterion vector whose components are given by 

z[®-^ = max {fi{x) I a: G 0 Ci 

where the need only be small positive values. An often convenient scheme 
is to use values for that raise each z[®^ to the smallest integer greater than 
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max{fi{x) I a; e S'} as in Fig. 0 A serves two purposes. One is to define 
the domain of the problem D = {z G \ Zi < the region to the “lower 

left” of the dashed lines in which everything relevant happens. The other, since 
dominates all points in N, is to function as point from which downward 
probes can be made to sample N. 




Fig. 8. Construction of a Reference Point 



5. Size of the Nondominated Set (Table ^). Now a few facts about the 
nondominated set. While N is always connected in the linear case, N can be 
disconnected in integer and nonlinear cases. Also, N is always a portion of the 
surface of Z . However, since the portion is often quite large, finding a most pre- 
ferred point in N is often not a trivial task. From computational results gleaned 
from Steuer [10], Tabled indicates the numbers of nondominated extreme points 
typically possessed by MOLPs by size. While N grows with the number of vari- 
ables and constraints, it grows most substantially with the number of objectives. 



Table 1. Nondominated Extreme Point Indications of Size of N 



MOLP Problem Size 


Typical Number of 
Nondominated Extreme Points 


3 X 30 X 25 


100 


3 X 50 X 50 


500 


4 X 30 X 50 


1000 


5 X 40 X 25 


3000 
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6. Weighted Tchebycheff Metric (Fig. El)- To compute a distance between 
a z G Z and it is often useful to employ a X-weighted Tchebychejf metric 

\\z-zrefu^ =max{A, | z, - z™^ 1} 

where (a) A G d = {A G | G (0, 1), ~ (^) associated with 

each A-weighted Tchebycheff metric is a probing ray emanating from z’"®-^ in the 
downward direction — ( ^ . , ^ ) • The contours (of points in equidistant 
from z'’®-^) of a given A-weighted Tchebycheff metric form a family of rectangles 
centered at z’’®-^. Moreover, in the domain _D of a problem, the vertices of all 
of the rectangles lie along the probing ray. In Fig. El with the probing ray in 
the direction —(2,1), the rectangles are contours of the A = (|,|) weighted 
Tchebycheff metric. With this metric, z^ is closest to z’"®-^ because it lies on the 
smallest rectangle, z^ is the next closest, and so forth. Note that by changing 
the A-vector, we change the direction of the probing ray and thus the shape of 
the rectangular contours. 




Fig. 9. Determining Points Closest to 



7. Points on Smallest Intersecting Contours (Figs. 11 PHI 2|l . In Fig. uni 
N = bls[z^ ^ z^]. With the probing ray as drawn, it is only necessary to show 
the portion of the smallest intersecting rectangular contour in D to see that z^ 
is the point in Z closest to z’’®-!'. Because the portion of any rectangular contour 
in D also portrays a translated nonnegative orthant, we further observe that z^ 
is nondominated. Several questions now arise. What happens if 

(a) the point on the smallest intersecting contour does not occur at 
the vertex of the contour as in Fig. inr 

(b) there is more than one point on the smallest intersecting con- 
tour as in Fig. II Are they all nondominated? 
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Fig. 10. Point Ecountered with Vertex 




Fig. 11. Point Encountered on Side 




As long as the point on the smallest intersecting contour is unique, by The- 
orem 2 (a generalization of Theorem 1), the point is nondominated. 

Theorem 2. Let z G Z and let z"" G function as the vertex of a translated 
nonnegative orthant such that ({ 2 ’'} 0 i?'*") C\Z = {z}. Then z in nondominated. 

In answer to (b), let Z\ <G Z he the set of points on the smallest intersecting 
contour. By Theorem 3, at least one point in Z\ is nondominated. 

Theorem 3. Let \ G A. Then z is nondominated if it is a point in Z\ that is 
closest to a according to a Li-metric. 

In Fig. with Z\ = bls[z^ — i- z^], z^ is seen to be nondominated as it is the 
point in Z^ that minimizes the sum of the coordinate deviations between it and 
or in other words, z^ is the point that solves 



min 
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8. Lexicographic Weighted Tchebycheff Sampling Program. With regard 
to the ability of a weighted Tchebycheff metric to generate a nondominated cri- 
terion vector, we have a two-stage process. In the first stage, we compute Z\, 
the set of points in Z closest to according to the A- weighted Tchebycheff 
metric. If Z\ is a singleton set, the point in Z\ is the nondominated point gener- 
ated. If Z\ contains more than one point, the second stage is called into action 
to compute, as the nondominnated point generated, a point in Z\ closest to 
according to the Li-metric. Incorporating this geometry into an optimiza- 
tion problem, we have the two-stage lexicographic weighted Tchebycheff sampling 
program for 



k 

lex min{ a, - 2i)} (2) 

i=l 

s.t. a > Aj(z[®-^ - Zi) 
fi{x) = Zi 
X G S 

0 < a G R, z G i?^unrestricted 

generating, not only a nondominated criterion vector, but an inverse image (i.e., 
an efficient point) from S that goes with it. Note that the first optimization 
stage minimizes the scalar a to implement the A-weighted Tchebycheff metric, 
and that the second stage, when invoked, minimizes ~ 

implement the Li-metric. It is called a “sampling” program because in this way, 
with a group of dispersed representatives from A, and solving for each of 
them, we have a strategy for sampling points from JV. 

9. T- Vertex A- Vectors. Note that in some cases, many different A- vectors could 
cause the lexicographic weighted Tchebycheff sampling program to generate the 
same nondominated point. However, out of each such set, one is special and it 
is called the T-vertex A-vector. The T-vertex A-vector is the one that causes the 
smallest intersecting contour to hit the nondominated point in question head- 
on with its vertex. For a given z G N and z”®-^, the coordinates of a T-vertex 
A-vector are given by 

( 3 ) 

10. Wierzbicki’s Aspiration Criterion Vector Method (Figs. I I :tm 4jl . 

As an introduction to the world of interactive procedures of multiple objective 
programming, we first discuss Wierzbicki’s Aspiration Criterion Vector method 
[12]. The purpose of the procedure is to find at each iteration h the point z^^^ G N 
that is closest to that iteration’s aspiration criterion vector < z”®-^, where 
gf^^’s purpose is to capture the DM’s criterion value hopes, expectations, and 
aspirations of the moment. 



A, = 



ref 

Z, - Zi 



k 

y ^ — 

/ ^ re f 
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Fig. 13. Wierzbicki 1-st Iteration Fig. 14. Wierzbicki 2-nd Iteration 

1. h = 0. Establish a G R^. 

2. h = h + 1. Specify a aspiration criterion vector. 

3. Compute the G A that causes the probing ray to pass 
through Equation (3) can be used by substituting for 

4. Using in (2), obtain the nondominated point closest 
to g^^^ as computed by the lexicographic weighted Tchebycheff 
sampling program. 

5. If after examining the DM decides to continue with another 
iteration, go to Step 2. Otherwise, terminate with z^^^> as the 
final solution. 

Consider the feasible region Z in Figs. O a,nd II 41 with z’’®/ = (5,4). On the 
first iteration, let the DM’s aspiration criterion vector be g*^^^ = (3|,3^). Then 
A^^^ = (q, |) and the nondominated point generated by (2) is z^^\ Continuing 
with another iteration, the DM specifies g^^^ = (3,1). Then A^^^ = (|, |) and 
the nondominated point generated by (2) is z^'^\ This iteration is interesting 
because, recognizing that g^^^ is dominated, the method produces a supe- 
rior nondominating point. Lacking a formal stopping criterion, the method 
continues as long as the DM is willing to specify new aspiration criterion vectors. 

11. Tchebycheff Method (Figs. 11 !iH1 tiji . Instead of generating only one 
solution at each iteration, the Tchebycheff Method conducts multiple probes 
by sampling each in a sequence of progressively small subsets of N. Letting P 
be the number of solutions presented to the DM at each iteration, we begin 
by selecting P dispersed A-vectors from A^^'> = A to form a group of dispersed 
probing rays as in Fig. II bl Then the lexicographic weighted Tchebycheff program 
(2) is solved for each of the A-vectors. From the P resulting nondominated 
criterion vectors, the DM selects the most preferred, designating it z^^\ Now, 
about the T-vertex A- vector defined by z^^^ and z®®-^, a reduced subset A^^^ C A 
is centered. Then P dispersed A-vectors are selected from to form a group 
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of more concentrated probing rays as in Fig. M and (2) is solved for each of 
them. From the P resulting nondominated criterion vectors, the DM selects the 
most preferred, designating it Now, about the T-vertex A-vector defined by 
and z’’®-!', a further reduced subset C /I is centered. Then P dispersed 
A- vectors are selected from and so forth. 




Fig. 15. Tchebycheff 1-st Iteration 




Fig. 16. Tchebycheff 2-nd Iteration 



12. Why Not Weighted-Sums Approach? One might ask why not to consid- 
ered assigning a A^-weight to each objective function and solve the weighted-sums 
program 

k 

max|y^ Xifi{x) \ x G S} 

i^K 

The reason is that the weighted-sums program only computes points that sup- 
port hyperplanes and is unable to compute unsupported nondominated criterion 
vectors. A z € N is unsupported if and only if there exists a convex combination 
of other nondominated criterion vectors that dominates z. Otherwise, z is 
supported. Let and A® designate the sets of unsupported and supported 
nondominated criterion vectors, respectively. For example in Fig.0 A“ = {z^} 
and A® = In Fig. 0 A“ = bls(z2 ^ ^3) and A® = {z\z^}, and so 

forth. If an unsupported point were optimal, the weighted-sums program would 
be unable to compute it. In contrast, the lexicographic weighted Tchebycheff 
program can compute any nondominated criterion vector without reservation. 

13. Other Interactive Procedures. In addition to the Aspiration Criterion 
Vector and Tchebycheff methods, there are other procedures of multiple objective 
programming including STEM [1], Global Shooting [2], TRIMAP [3], Light Beam 
Search [5], Pareto Race [6], Bi- Reference Procedure [7], Fuzzy Satisficing [8], 
PSI [9], and FFANN [II], among others. While embodying various philosophies, 
most of these procedures nonetheless use variants of the lexicographic weighted 
Tchebycheff program [4] to probe A in different ways. 
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3 Conclusion 

Because the lexicographic weighted Tchebycheff sampling program and its vari- 
ants in other procedures can be configured for solution using conventional single 
criterion mathematical programming software, the procedures of multiple objec- 
tive programming can generally address problems with as many constraints and 
variables as in the single criterion case. However, there is a limit to the num- 
ber of objectives. While problems with up to 5-6 objectives can generally be 
accommodated, above this number gets us into uncharted territory where future 
research is needed. 
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Abstract. Ill-definition, uncertainty and multiple objectives are primary 
characteristics of real-world decision-making processes. During the initial 
stages of such processes little knowledge appertaining to the problem at hand 
may be available. A primary task relates to improving problem definition in 
terms of variables, constraint and both quantitative and qualitative objectives. 
The problem space develops with information gained in a dynamical process 
where optimisation plays a secondary role following the establishment of a 
well-defined problem domain. The paper speculates upon the role of 
evolutionary computing, complementary computational intelligence techniques 
and interactive systems that support such problem definition where multi- 
objective satisfaction plays a major role. 



1 Introduction 

The author’s area of interest has primarily been within the field of evolutionary 
engineering design particularly relating to the higher levels of the design process 
where problem conceptualisation can represent a highly complex human-centred 
activity supported by a range of relatively basic machine-based models of the problem 
domain. 

The process generally consists of search across an ill-defined space of possible 
solutions using fuzzy objective functions and vague concepts of the structure of the 
final solution. Solutions and partial solutions are explored and assessed in terms of 
their feasibility with regard to those constraints and objectives considered relevant at 
that time. Heuristics, approximation and experimentation play a major role with a 
high degree of flexibility evident in the establishment of domain bounds, objectives 
and constraints. The design environment itself will evolve with the solutions as the 
designer/design team gain understanding of the functional requirements and the 
resulting structures. Simple human and computer-based models which may be largely 
qualitative in nature are utilised in order to establish initial direction. The decision- 
making environment is characterised by uncertainty in terms of lack of available data 
and a poorly defined initial specification. Discovery and the accumulation of 
knowledge appertaining to problem definition and objective preferences are prevalent 
in this highly dynamical human / machine-based process The following quote [1] 
relating to creative design captures these aspects: 
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. . .problem formulation and reformulation are integral parts of creative design. 
Designers’ understanding of a problem typically evolves during creative design 
processing. This evolution of problem understanding may lead to (possibly radical) 
changes in the problem and solution representations. [....] in creative design, 
knowledge needed to address a problem typically is not available in a form directly 
applicable to the problem. Instead, at least some of the needed knowledge has to be 
acquired from other knowledge sources, by analogical transfer from a different 
problem for example. [...] creativity in design may occur in degrees, where the 
degree of creativity may depend upon the extent of problem and solution 
reformulation and the transfer of knowledge from different knowledge sources to the 
design problem. 



2 Changing Objectives during Decision-Making 

Discovery and knowledge accumulation aspects of problem formulation are common 
across decision-making as a whole. Whether we are designing an engineering system, 
developing a financial strategy or establishing a business process, exploration 
involving initial variables, constraints and objectives will likely result in re- 
formulation of the problem domain through iterative search and user analysis of 
identified solutions. 

This can be illustrated relatively simply through a decision-making process 
familiar to most which, although unrelated to industrial or commercial activities, 
could be seen as analogous in terms of discovery, knowledge accumulation, problem- 
reformulation and the eventual identification of a best compromise solution. 

For illustrative purposes let us therefore consider a job-related relocation to a new 
city and the daunting problem of finding a family home. Initial investigation will 
likely relate to identifying appropriate districts based upon criteria relating to quality 
of local schools; safety / security issues; proximity to places of work, transport, 
highway networks, shopping and leisure facilities etc. plus average price and type of 
housing and overall environment. Other criteria relate directly to the ideal property 
such as maximum cost, number of bedrooms, garden, garage, parking etc. Several of 
the above criteria would be considered hard constraints (i.e. maximum cost) in the 
first instance. 

The decision-making team is the family who would all probably rate the relative 
importance of the above criteria in a slightly different manner and whose opinions 
will carry a varying degree of influence. It is likely that initially there is a pretty clear 
vision of what the ideal property will look like and the preferred location. 

Initial information gathering will provide quantitative and qualitative data relating 
to location from a wide variety of sources some reliable and some based upon 
hearsay. Gradually an overall picture will be established which will result in possible 
elimination of some options and the inclusion of new possibilities. New possible 
locations will be discovered during explorative trips to those already identified. 

As details of properties are gathered it will likely become apparent that the ideal 
solution is hard to find and the concept of compromise becomes a reality. Hard 
constraints may soften whereas objective preferences will constantly be discussed and 
re-defined in the light of accumulated knowledge regarding districts and property 
availability within them. Particular characteristics of areas initially thought to be 
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unsuitable may suddenly appear attractive. Search concentration may shift as it is 
discovered that such areas have suitable properties within the pre-set price range. 
Alternatively, the initial hard constraint relating to maximum price may soften as 
close to ideal properties in favoured locations become available. Other possible 
compromises are investigated in an attempt to accommodate increased costs. 

The whole decision-making process becomes an uncertain mix of subjective / 
objective decisions as goal-posts move, objectives rapidly change in nature and 
external pressures relating to time constraints begin to take precedence. At the end of 
the day it is quite probable that the chosen home differs significantly from the one 
first envisaged. That guest bedroom may have been sacrificed and the garden may be 
far smaller but the location is ideal. Alternatively, the route to work may be longer 
and more tortuous but a property close to ideal at the right price in an up-and-coming 
neighbourhood has been found. 

Although a seemingly simple problem the overall search process is highly complex 
and uncertainty, compromise and problem re-definition are inherent features. 
Although a little far removed from commercial and industrial decision-making 
scenarios analogies are apparent. It is suggested that we can learn much from such 
everyday decision-making scenarios and utilise this knowledge when designing 
interactive evolutionary search environments that can support more complex decision- 
making processes. 



3 Knowledge Generation and Extraction 

A machine-based search and exploration environment has been proposed that 
provides relevant problem information to the designer / decision-making team [2,3]. 
The intention is that such information can be processed and subsequent discussion can 
result in the recognition of similarities with other problem areas and the discovery of 
possible alternative approaches. One of the major characteristics of population-based 
evolutionary search is the generation of a large amount of possibly relevant 
information most of which may be discarded through the actions of various operators. 
The development of interactive systems may support the capture of such information 
and its utilisation in the subsequent re-formulation of the problem through the 
application and integration of experiential knowledge. It can be argued that such 
problem re-formulation captures this knowledge which then plays a significant role in 
further evolutionary search relating to the re-defined problem. The re-definition of 
objectives and objective preferences is an important aspect of this evolution of the 
problem space. 

It is apparent that a core activity within the decision-making process relates to the 
gathering of information relating to diverse aspects of the problem space. It has been 
proposed [4] that a primary role of evolutionary machine-based search and 
exploration processes can provide a central role in the generation of such information. 
This moves the utilisation of EC away from application over a set number of 
generations or until some convergence criteria is met to a more continuous 
exploratory process where changes to objective weightings, variable ranges and 
constraint based upon information generated results in a moving, evolving problem 
space. The primary task of such an approach is the design of an optimal problem 
space as opposed to the identification of an optimal solution. 
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This theme has been central to much of the author’s previous work where the 
development of EC strategies relating to the higher levels of the design process has 
related to the identification of high performance regions of complex conceptual 
design space (vmCOGAs, [5,6]) or the identification of optimal alternative system 
configurations through the utilisation of dual-agent strategies for search across mixed 
discrete / continuous design hierarchies [7,8]. Other work relates to the concurrent 
satisfaction of both quantitative and qualitative criteria through the integration of 
fuzzy rule bases with evolutionary search [9]. 

More recent work has led to the establishment of an experimental interactive 
evolutionary design system (lEDS) that supports a relatively continuous iterative user 
/ evolutionary search process that involves EC, agent-based approaches and a number 
of other complementary techniques. 

An overview of this research that illustrates the manner in which EC-based 
information gathering can support decision-making in complex multi-objective design 
environments follows. A much more detailed treatment can be found in the 
referenced text whereas the author’s research as a whole is presented in the book 
‘Evolutionary and Adaptive Computing in Engineering Design’ [10]. 



4 Experimental Approaches 



4.1 The Qualitative Evaluation System (QES) 

Early work related to the development of a Qualitative Evaluation System. The QES 
strategy provides support to the designer when attempting to determine trade-offs 
between both quantitative and qualitative criteria. This support utilises a linguistic 
rule base (generated in close collaboration with Rolls Royce turbine engineers) which 
resides within a fuzzy expert system. The rules relate to the comparative 
effectiveness of a GA-generated design solution in terms of manufacturability, choice 
of materials and a number of special preferences relating to in-house capabilities. 
Quantitative aspects of a design are combined with qualitative ratings to generate a 
measure of the overall fitness of the solutions. Domain knowledge concerning 
variable preferences and heuristics is utilised and combined using a concept of 
compromise [11,12]. 

The work concerns the preliminary design of gas turbine engine cooling hole 
geometries. The primary objective is to minimise the mass flow through the radial 
cooling hole passage. Adaptive Restricted Tournament Selection (ARTS [13]) 
identifies a number of single high performance solutions from the design space. The 
qualitative evaluation system receives all the design variable values of each of these 
solutions as inputs and develops an overall qualitative rating concerning the 
effectiveness of the design. 

The ARTS multi-modal GA is utilised to first identify a number of ‘good’ 
quantitative (i.e. minimal mass coolant flow) design solutions. These “good” 
solutions are next evaluated by the QES which takes the variable values of each 
solution as inputs and outputs a qualitative rating for the design. The QES has three 
components, the fuzzifier, the fuzzy inference engine and the defuzzifier. Designer 
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knowledge provided by Rolls Royce engineers is stored in a static fuzzy rule base. 
During the fuzzification stage each variable range is divided into five sub-ranges, and 
expressed using linguistic terms. A crisp value for the effectiveness is obtained 
through centre-of-gravity type defuzzification. 

The knowledge base for the system is developed using fuzzy rules and facts 
embodying qualitative aspects of the design problem in terms of manufacturability, 
choice of materials and some designer’s special preferences. The knowledge base is 
presented in three categories: Inter Variable Knowledge which relates to the relative 
importance of each variable in terms of the objective function; Intra Variable 
Knowledge which relates to preferred subsets of each variables’ range e.g. blade wall 
thickness needs to be low in terms of material cost but high in terms of stress 
considerations; Heuristics which mostly concern specific cases where there is no 
uncertainty concerning the conclusion. 

The inter and intra variable knowledge is then integrated using a concept of 
compromise which is implemented to reduce the severity of qualitative ratings. Inter 
variable knowledge determines the degree of compromise possible on every variable 
(slight compromise, less compromise, compromise and more compromise). A more 
complete definition of the fuzzy rule base can be found in [9 & 1 1]. 

Having evaluated a solution both quantitatively via the ARTS GA process and 
qualitatively via the QES the results can be combined and presented in a graphical 
manner that facilitates overall understanding of the major aspects of the problem. 
Four solutions for each of three internal cooling-hole geometries (plane, ribbed and 
pedestal) are presented in figure Iwith their relative quantitative fitness plainly shown 
by the major bars of the chart. The qualitative ratings for each solution are then 
shown as a series of embedded, shaded bars. Such a representation contains much 
information in a relatively transparent manner. It is apparent, for instance, that 
although the plane cast internal geometry provides a low quantitative fitness the 
solutions are relatively robust in terms of the qualitative criteria. The qualitative bars 
relating to the other two geometries show a much greater degree of fluctuation. This 
can perhaps provide insight into the problem and criteria characteristics which aid the 
designer in terms of both modelling the system and determining preference rankings 
for each criteria. For instance, if the priority is for a solution that can be considered 
low risk in terms of possible problems relating to manufacturing, material and special 
preferences aspects and losses relating to quantitative performance can be made up 
within other areas of the turbine design then a compromise can be made and a plane 
cast internal geometry can be chosen. If, however, quantitative performance is 
paramount then a pedestal geometry that best satisfies preferences relating to the three 
qualitative criteria may be considered appropriate. 

The QFS provides a good indication of the relative merits of high-performance 
solutions in terms of a number of qualitative criteria. A possible problem area here 
however is the flexibility of the rule-base. There is a requirement for on-line rule 
changes that should be easy to implement by users with no knowledge of fuzzy 
inference technologies. A high degree of flexibility in terms of the representation of 
objectives and their weightings is a major requirement during these higher levels of 
the design process. It is possible that the major utility offered by the QFS relates to 
more routine design tasks where problem definition is already high. 
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Fig. 1. Graphical representation of ARTS and the QES 



4.2 The Interactive Design System 



4.2.1 introdnetion 

The QES illustrates the manner in which qualitative criteria can be combined with 
quantitative evaluation but it does not readily support on-line interaction. The 
requirement for a system that supports the on-line extraction of information that can 
be presented to the user in a succinct manner thereby supporting easily implemented 
change has led to an investigation of various techniques that can be combined within 
an overall architecture. 

The satisfaction of multiple objectives (i.e.> 10) is a major requirement and such 
objectives must be entirely flexible in terms of preferences / weightings to allow 
adequate exploration of the problem domain and to support a better understanding of 
the complex interactions between variable space and objective space. 

The developed system involves a number of machine-based processes that can 
communicate as shown simply in figure 2. The user is an integral part of the system 
accepting, analysing and processing information from the system, introducing 
objective change via the Preferences component and variable parameter changes 
directly into the Evolutionary component. The Evolutionary component can operate 
in several modes: 

1. Single evolutionary process 

2. Multiple individual evolutionary processes - no co-operation 

3. Multiple co-evolving processes 
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Mode 1 relates to problem decomposition aspects where the Cluster-oriented 
Genetic Algorithms of the Information Gathering component extract information 
relating to variable interaction, variable redundancy and the setting of appropriate 
variable parameter ranges. This is describe fully in references [3 & 4] and is not 
included in the following text as it does not directly relate to multi-ohjective aspects. 




Co-evolutionary / Stand-alone 
Multi-objective Processes 



Linguistic Preferences / 
Objective Weighting 




Decision-maker / 
Designer 




Information 

Gathering 

Processes 

COGAs 

Taguchi 

etc 



Fig. 2. Components of the Interactive Evolutionary Design System 



4.2.2 COGAs as Multi-objective Information Gatherers 

Mode 2 concerns the concurrent identification of high-performance (HP) regions of 
the problem space relating to individual objectives and the subsequent definition of 
common HP regions where best compromise solutions may be found. Again, the 
COGA techniques of the Information Gathering component are involved. The basic 
structures of COGAs and the associated adaptive filter (AF) have been described in a 
number of papers [4,5,6]. Their function relates to the rapid decomposition of 
complex, multi-variate design space into regions of high-performance. At the COGA 
core is an explorative genetic algorithm (GA). Exploration has initially been 
promoted through variable mutation regimes (vmCOGA) and more recently through 
the integration of various sampling techniques [13]. The AF mechanism extracts and 
scales populations (in terms of fitness) from a continuous GA evolutionary process 
only allowing solutions that lie above a filter threshold to pass into a final clustering 
set. The design exploration capabilities are well described in Parmee and Bonham [4] 
along with extensive discussion relating to possible interactive utilisation. 

Of particular interest here is the identification of high-performance regions relating 
to differing objectives from consecutive runs of COGA. Identified HP regions can be 
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overlaid upon selected two-dimensional hyperplanes described by pairs of variables 
selected from the set of variable parameters that describe the problem space. 

The utilisation of the Evolutionary and Information Gathering components is 
demonstrated within a design domain relating to the preliminary design of military 
aircraft. This is a complex design domain characterised by uncertain requirements and 
fuzzy objectives relating to the long gestation periods between initial design brief and 
realisation of the product. Changes in operational requirements in addition to 
technological advances cause a demand for a responsive, highly flexible strategy 
where design change and compromise are inherent features for much of the design 
period. Design exploration leading to innovative and creative activity must be 
supported. The ability to introduce rapid change to satisfy the many operational, 
engineering and marketing considerations as they themselves change is essential. In 
this case the COGA software is manipulating the BAE Systems mini-CAPS model. 
This model is a much condensed version of the CAPS (Computer-aided Project 
Studies) suite of software for conceptual and preliminary airframe design. Mini- 
CAPS maintains many of the characteristics of the overall suite especially in terms of 
multiple objectives. The nine input variables that define the problem space can 
generate up to thirteen outputs, 

Eigure 3a shows high-performance regions relating to three mini-CAPS objectives: 
attained turn rate (ATR), specific excess power (SEP) and Perry Range (FR) plotted 
on the gross wing plan area / wing aspect ratio variable parameter hyperplane. As 
can be seen regions relating to ATR and FR overlap forming a region containing HP 
compromise solutions. There is no mutually inclusive HP region relating to SEP 
however. By returning to COGA and reducing the severity of the adaptive filter in 
relation to SEP solutions the SEP region can be expanded as shown in figure 3band 3c 
until a mutually inclusive region involving all objectives is identified. This relaxing of 
the adaptive filter threshold allows lower performance SEP solutions through to the 
final clustering set. This could be considered equivalent to a lessening of the relative 
importance of this objective. 

The technique allows the projection of objective space onto variable space. This 
gives a visual appreciation of the interaction between the various objectives and 
supports the user in the determination of initial preferences concerning their relative 
performance. Current work is investigating the relationship of the solutions within the 
mutually inclusive regions to the non-dominated solutions of the Pareto frontier. This 
work is indicating that a good approximation to the Pareto front is contained within 
these regions which can be realised by identifying the non-dominated solutions 
contained within the high performance regions describing all objectives. 

It is stressed that this visual representation is an indication only. Even if mutually 
inclusive compromise regions are apparent in all variable parameter hyperplanes there 
still exists a possibility that these regions do not exist to the extent suggested in the 
graphical representations. This could be due to highly convoluted high-performance 
regions. Current work relates to the development of agent-based systems that check 
the validity of compromise regions across all dimensions. These agents will then 
inform the designer of any possible problem. This is not a complex procedure as all 
solution vectors describing the regions are available. 
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Fig. 3. Identification of compromise 
high-performance regions relating 
through filter threshold relaxation 

(a) A common region for Ferry Range 
and Turn Rate has been identified but 
Specific Excess Power objectives cannot 
be satisfied. 

(b) Relaxing the SEP filter threshold 
allows lower fitness solutions through 
and boundary moves 

(c) Eurther relaxation results in the 
identification of a common region for all 
objectives. 



The aim of this work, however, is to support a better understanding of objective 
interaction and conflict through graphical representation rather than providing a 
succinct and accurate representation of compromise regions or of the Pareto frontier. 
In this sense, the technique again supports the generation of information appertaining 
to the problem at hand where variables and objectives and can vary as problem 
knowledge expands. The approach therefore takes into consideration the uncertainties 
and ill-definition inherent in the utilisation of preliminary design models and of the 
degree of initial understanding of the problem domain. A visual representaion of the 
degree of possible conflict and the manner in which a changing of objective 
preferences can result in a relaxation of such conflict is considered more viable than 
the utilisation of more sophisticated techniques that identify optimal non-dominated 
solutions that lie upon the true Pareto frontier at this stage. The basic notion of 
‘rubbish in, rubbish out’ must be taken into consideration. Much time could be spent 
upon more definitive analysis to identify Pareto optimal points that prove erroneous 
upon the introduction of more definitive problem models. The strategy therefore 
indicates a probable best way forward rather than the global solution to the problem 
which, at this stage, is poorly defined. 
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4.2.2 The Preference Component 

Having gained a better understanding of objective interactions and of problems 
relating to the degree of difficulty likely to be encountered in the satisfaction of initial 
objective preferences a highly flexible interface for the introduction of such 
preferences to a more definitive multi-objective search process is required. 

A methodology that supports on-line variation of design preferences has been 
developed. Efficient exploration across the many different design variants that the 
designer wishes to assess is of more interest than the identification of single optimal 
solutions. The system should therefore be able to further support such exploration 
whilst also suggesting best design direction. 

It is generally accepted that it is easier for the decision-maker to give qualitative 
ratings to objectives i.e. ‘Objective A is much more important than objective B’ than 
to set the weight of objective A to, say, 0.1 or to 0.09. The method of fuzzy 
preferences [14] and induced preference order can be utilised to introduce such rule- 
based preference representation for transformation into appropriate objective 
weightings. The following predicates can be introduced [15]: 



Table 1. Rule-based preference representation 



relation 


intended meaning 


relation 


intended meaning 




is much less important 


< 


is less important 


« 


is equally important 


# 


don’t know or don’t care 


— 1 


is not important 


1 


is important 



These together with the complementary relations > and », can help build the 
relationship matrix R necessary for a ‘words to numbers’ transformation. For this 
transformation, concepts of ‘leaving score’ [14] amongst other techniques, can be 
employed. 

It is first necessary for the engineer to rank the objectives in terms of relative 
importance but numeric weightings are not required. If transitivity is assumed, then 
the number of required questions to establish overall preference ratings is reduced. 
The preference algorithm has been described fully in a number of publications [2,15] 
and it is not intended to reproduce it here. Examples from the graphical user interface 
follow which illustrate the preference procedure. The problem domain again relates 
to BAE preliminary airframe design. 

The user first selects those objectives that require investigation. In this case Take- 
off Distance, Landing Speed, Ferry Range and Mass at Take-off have been selected 
from a possible thirteen outputs from the mini-CAPS model. Having selected the 
objectives it is necessary to establish equivalence classes which classify the objectives 
in terms of relative performance. In the example (figure 4), two of the objectives are 
considered to be equally important, and two further objectives are considered to have 
different levels of importance. Three different levels of importance are therefore 
processed. 
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Fig. 4. Establishing equivalence classes 

The interface facilitates the on-line change of objective preferences which allows 
further exploratory runs providing information relating to possible problem re- 
definition. Having selected objectives, established objective preferences and 
performed the machine-based ‘words-to-numbers’ transformation the numeric 
weightings can be passed to the Evolutionary process module. 

4.2.3 The Co-evolutionary Multi-objective Approach (Mode 3) 

The Preference component has been linked with the core co-evolutionary processes. 
The goal is to explore high-performance solutions relating to several objectives whilst 
providing maximum information concerning: appropriate regions of complex, multi- 
dimensional, Pareto surfaces; single objective optimal solutions and a number of 
solutions that best satisfy a range of ideal scenarios. This approach is an alternative to 
the generation of n-dimensional trade-off surfaces comprising very large numbers of 
non-dominated solutions that can be achieved using standard EC-based Pareto 
approaches. Again, the intention is to generate information that supports a better 
understanding of the multiple criteria aspects of the problem rather than identity 
Pareto-optimal solutions. 

The distributed method involves individual GAs for the optimisation of each 
objective. Search relating to an individual objective takes place upon each 
evolutionary process. Subsequently, through the application of penalties, the co- 
evolving processes are drawn into that region of the overall space that offers best 
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compromise relating to all objectives and their pre-set preferences (Parmee I. et al 

2000 ). 

During the co-evolutionary run the fitness for each objective is normalised relative 
to the maximum and minimum values found for each GA with constant adjustment as 
new upper and lower limits are identified. In each generation, the variables of 
solutions relating to each objective are compared with those of the best individual 
from the other co-evolving GA populations. If a variable is outside a range defined by 
a range constraint map it is adjusted by a penalty function. The range constraint map 
reduces maximum allowable distances between variables at each generation. Initially 
the map must allow each GA to produce a good solution based on its own specified 
objective. As the run progresses inflicted penalties increasingly reduce variable 
diversity to draw all concurrent GA searches from their separate objectives towards a 
single compromise design region where all objectives are best satisfied. This process 
is illustrated in figure 5(b) where the individual evolution of each objective is 
projected upon the Ferry Range objective hyperplane. In this case all objectives are of 
equal importance. 

The machine-generated numeric weightings resulting from the preference ranking 
introduced by the user can now modify the penalties inflicted. A heavy penalty 
inflicted upon a ‘much more important’ objective is therefore moderated in order to 
allow some influence upon the co-evolutionary search whereas such penalties upon 
objectives of lesser importance may not change or may be modified to take into 
account objective ranking. The effect of varying the relative importance of the Ferry 
Range objective via the Preference component is shown in figure 5(a), 5(b) and 5(c). 

In most real decision-making situations variables will have differing degrees of 
influence upon any given objective. An on-line sensitivity analysis which ranks 
variables according to their influence upon each objective has been introduced. This 
design sensitivity ranking is then used to adjust the fitness of each solution to ensure 
that the values of the most influential variables are within the range defined by the 
constraint map. Solutions are assigned the highest fitness penalty where their most 
influential variables lie outside of the current constraint map range. This ensures that 
populations contain high levels of compromise solutions in terms of the most 
influential variables and relatively redundant variables have little or no effect on 
overall solution fitness. The Taguchi method has been selected to determine the 
sensitivity of each input [16]. 

Again, the concentration is upon information-gathering and visual representation 
rather than the identification of Pareto optimal points. The co-evolutionary multi- 
objective approach provides the following information within one run of the process: 
high-performance solutions relating to the individual objectives; evolutionary 'tracks' 
that trace the Pareto surface to some extent; the bounds of a compromise region where 
all objectives will likely be best satisfied and the identification of influential and 
redundant variables relating to each objective. A more in-depth description of the 
process with results can be found in reference [17]. 

4.2.4 Agent-Based Support 

Stochastic population-based search generates a mass of information much of which is 
discarded. The intention is to introduce agents that monitor solutions generated from 
single or co-evolutionary processes and identify those that may be considered 
‘interesting’ by the decision-maker. The notion of ‘interesting’ may relate to: a good 
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solution with a large Hamming or Euclidean distance from the majority of the 
population; a good solution that may satisfy the majority of constraints or objectives 
but is not satisfactory in a few; a not particularly high-performance solution where 
the constituent variable values lie within preferred ranges. 




Fig. 5. (a) F erry Range is much more important 

(b) All objectives are of equal importance 

(c) Ferry Range is much less important 



The concept of ‘interesting’ may be largelay subjective which suggest that a degree 
of machine-learning may be appropriate where the responsible agents ‘learn’ from 
user reaction to possible interesting solutions presented to them. This is obviously an 
area requiring extensive further research. 

Experimental negotiating agent systems utilising the rule-based preferences have 
been established for the identification of solutions that satisfy a range of design 
scenarios relating to multiple objectives and ideal variable values. For instance, the 
designer is likely to have several ideal scenarios such as: ‘I would like objective A 

to be greater than 0.6 and objective C to be less than 83.5; objectives B, D, E should 
be maximised; variable 2 should have a value of between 128.0 and 164.5; a value 
greater than 0.32 is prefered for variable 7. . .etc’. The Incremental Agent operates as 
follows: use designer’s original preferences for both objectives and scenarios and run 
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the optimisation process; if some scenarios are not fulfilled, the agent suggests an 
increase in the importance of these scenarios; if some scenarios are still not fulfilled 
even when classed as ‘most important’ agent suggests change to variable ranges in 
scenario; if some scenarios are still not fulfilled the agent reports to designer and asks 
for assistance. 

The Incremental Agent strategies have been integrated with the Preferences and 
Co-evolutionary Multi-objective component on an experimental basis. Both the 
Scenario and Incremental Agent components sit between the designer and the 
Preference module drawing information from both. For a more detailed description of 
the processes and initial results based upon a miniCAPS example the reader is 
directed to Cvetkovic D. [18]. 

5 Discussion 

The paper discusses complexities relating to real-world multi-objective decision- 
making processes where the problem domain develops with information gained from 
evolutionary search and exploration. The intention has been to illustrate the manner 
in which EC can support such processes through the establishment of highly 
interactive systems where generated information provides problem insights which 
result in problem reformulation. An initial framework has been briefly outlined that 
supports the concept of an Interactive Evolutionary Design System (lEDS). Much 
further research and development is required to achieve relatively seamless 
development of the problem space where the decision-maker’s knowledge becomes 
embedded within an iterative human / evolutionary computational process. 

The entire concept moves away from the identification of non-dominated solutions 
and the generation of an n-dimensional Pareto frontier. The reasoning is that the 
inherent uncertainties and human-centred aspects of complex decision-making 
environments renders such approaches less viable although their utility is well- 
founded in more well-defined problem areas. 

The intention is also to move away from the identification of solutions through the 
short-term application of evolutionary search techniques. The goal is a continuous, 
dynamic explorative process primarily made possible by the search and exploration 
capabilities of iterative designer / evolutionary systems. Although ambitious, it is 
suggested that such a concept could best utilise the processing capabilities of present 
and future computing technology during complex human / machine-based decision- 
making activities. 

The proposed architecture and lEDS concept provides an indication of what is 
possible in terms of interaction, information gathering and problem reformulation 
relating to variables and objectives. Eurther research may result in a much modified 
structure where agent technologies play a major and, to some extent, autonomous role 
to ensure appropriate communication and information processing capabilities. It is 
hoped that the concepts presented here will stimulate interest and such further 
research. 




66 



I.C. Parmee 



References 



1. Goel A. K.: Design, Analogy and Creativity. IEEE Expert, Intelligent Systems and their 

Applications, 12 (3). (1997) 62 -70 

2. Parmee I. C., Cvetkovic C., Watson A. H., Bonham C. R.: Multi-objective Satisfaction within an 
Interactive Evolutionary Design Environment. Evolutionary Computation. 8 (2), (2000) 197 - 222. 

3. Parmee I. C., Cvetkovic C., A. H., Bonham C. R., Packham I.: Introducing Prototype Interactive 
Evolutionary Systems for Ill-defined Design Environments. To be published in Journal of Advances 
in Engineering Software, Elsevier, (2001). 

4. Parmee I. C., Bonham C. R.: Towards the Support of Innovative Conceptual Design Through 
Interactive Designer / Evolutionary Computing Strategies. Artificial Intelligence for Engineering 
Design, Analysis and Manufacturing Journal; Cambridge University Press, 14, (1999) 3 - 16. 

5. Parmee I. C.: The Maintenance of Search Diversity for Effective Design Space Decomposition using 
Cluster-Orientated Genetic Algorithms (COGAs) and Multi-Agent Strategies (GAANT). Proceedings 
of 2nd International Conference on Adaptive Computing in Engineering Design and Control, PEDC, 
University of Plymouth; (1996) 128- 138. 

6. Parmee, I. C.: Cluster Oriented Genetic Algorithms (COGAs) for the identification of High 
Performance Regions of Design Spaces. First International Conference on Evolutionary Computation 
and its Applications, EvCA 96, Presidium of the Russian Academy of Sciences, Moscow; (1996) 66- 
75. 

7. Parmee, I.C.: The Development Of A Dual-Agent Strategy For Efficient Search Across Whole 
System Engineering Design Hierarchies. Proceedings of Parallel Problem Solving from Nature. 
(PPSN IV), Lecture notes in Computer Science No. 1 141; Springer-Verlag, Berlin (1996) 523-532. 

8. Parmee I. C.: Evolutionary and Adaptive Strategies for Efficient Search Across Whole System 
Engineering Design Hierarchies. Journal of Artificial Intelligence for Engineering Design, Analysis 
and Manufacturing; 12 (1998) 431 - 435. 

9. Roy R, Parmee I. C, Purchase G.: Integrating the Genetic Algorithm with the Preliminary Design of 
Gas Turbine Cooling Systems. Proceedings of 2"^ International Conference on Adaptive Computing in 
Engineering Design and Control, PEDC, University of Plymouth (1996). 

10. Parmee I. C.: Evolutionary and Adaptive Computing in Engineering Design. Springer Verlag, London 

(2001). 

1 1 . Roy R., Parmee I. C., Purchase G.: Qualitative Evaluation of Engineering Designs using Fuzzy Logic. 
Proceedings of ASME Design Engineering Technical Conferences and Computers in Engineering 
Conference, Irvine, California; (1996) 96-DETC/DAC-1449. 

12. Roy R., Parmee I. C.: Adaptive Restricted Tournament Selection for the Identification of Multiple 
Sub-optima in a Multi-modal Function. Lecture Notes in Computer Science, Evolutionary 
Computing; Springer-Verlag, (1996) 236-256. 

13. Bonham C. R., Parmee I. C.: An Investigation of Exploration and Exploitation in Cluster-oriented 
Genetic Algorithms. Proceedings of the Genetic and Evolutionary Computation Conference, Orlando, 
Florida, USA; (1999) 1491 - 1497. 

14. Fodor J., Roubens M.: Fuzzy Preference Modelling and Multi-criteria Decision Support. System 
Theory, Knowledge Engineering and Problem Solving, 14 ; Kluwer Academic Publishers (1994). 

15. Cvetkovic D., Parmee I. C.: Designer’s Preferences and Multi-objective Preliminary Design 

Processes. Evolutionary Design and Manufacture: Proceedings of the Fourth International 

Conference on Adaptive Computing in Design and Manufacture. Springer-Verlag (2000) 249 - 260. 

16. Peace G. S.: Taguchi Methods. Addison Wesley, Reading, M. A. (1992). 

17. Parmee I. C., Watson A. W.: Preliminary Airframe Design using Co-evolutionary Multi-objective 
Genetic Algorithms. Proceedings of the Genetic and Evolutionary Computation Conference, Orlando, 
Florida, USA; (1999) 1657 - 1665. 

18. Cvetkovic D.: Evolutionary Multi-objective Decision Support Systems for Conceptual Design. PhD 
Thesis, University of Plymouth (2000). 




Controlled Elitist Non-dominated Sorting 
Genetic Algorithms for Better Convergence 



Kalyanmoy Deb and Tushar Goel 



Kanpur Genetic Algorithms Laboratory (KanGAL) 
Indian Institute of Technology Kanpur 
Kanpur, PIN 208 016, India 
{deb, tusharg}@iitk. ac . in 
http : //www . iitk . ac . in/kangal 



Abstract. Preserving elitism is found to be an important issue in the 
study of evolutionary multi-objective optimization (EMO). Although 
there exists a number of new elitist algorithms, where elitism is intro- 
duced in different ways, the extent of elitism is likely to be an important 
matter. The desired extent of elitism is directly related to the so-called 
exploitation-exploration issue of an evolutionary algorithm (EA). For 
a particular recombination and mutation operators, there may exist a 
selection operator with a particular extent of elitism that will cause a 
smooth working of an EA. In this paper, we suggest an approach where 
the extent of elitism can be controlled by fixing a user-defined parame- 
ter. By applying an elitist multi-objective EA (NSGA-II) to a number of 
difficult test problems, we show that the NSGA-II with controlled elitism 
has much better convergence property than the original NSGA-II. The 
need for a controlled elitism in evolutionary multi-objective optimiza- 
tion, demonstrated in this paper should encourage similar or other ways 
of implementing controlled elitism in other multi-objective evolutionary 
algorithms. 



1 Introduction 

It is now well established through a number of studies □ni that elitist multi- 
objective evolutionary algorithms (MOEAs) have better convergence characteris- 
tics than non-elitist MOEAs. Motivated by these studies, researchers and prac- 
titioners now concentrate on developing and using elitist MOEAs. This have 
resulted in a number of elitist MOEAs, such as strength Pareto EA or SPEA 
H3, Pareto-archived evolution strategy or PAES|B|, and others. After the early 
suggestion of a non-elitist multi-objective genetic algorithm (NSGA) jHI, the first 
author and his students have also suggested an elitist, fast, and parameter- free 
multi-objective GA (NSGA-II) |2I4) . All these elitist algorithms are interesting 
and have shown tremendous potential in solving multi-objective optimization 
problems. 

In the context of single objective optimization, the elitism is introduced in a 
number of different ways. Preserving elitism means emphasizing the currently- 
best solutions in the subsequent generations. In some implementations, this is 
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achieved by simply carrying over the best e% of the population in the next gener- 
ation. In other implementations, the parent and child populations are combined 
together and the best 50% of the population is retained. This way previously- 
found good solutions are given a chance to carry over to subsequent generations. 

In the context of multi-objective optimization, the meaning of elite solutions 
is different from that in single objective optimization. Here, all solutions that 
belong to the currently-best non-dominated front are best solutions in the popu- 
lation and are all equally important. Thus, all these solutions are elite solutions. 
In many occasions, a population may be mostly comprised of currently-best 
non-dominated solutions. When this happens, the preservation of elitism means 
acceptance of all such solutions. In such a scenario, not many new solutions 
can be accepted in the population. As a result, the search process may stag- 
nate or prematurely converge to a suboptimal solution set. Thus, there is a need 
of introducing elitism in a controlled manner, in the context of multi-objective 
optimization. 

In this paper, we address the issue of controlling elitism from the point of 
maintaining a balance between underlying exploitation and exploration issues. 
Thereafter, we argue that our earlier elitist implementation NSGA-II can have 
uncontrolled elitism in tackling certain problems. We suggest a controlled elitist 
approach, where only a certain portion of the population is allowed to contain 
the currently-best non-dominated solutions. The dominated solutions are pur- 
posefully kept in the population to reduce the elitism effect. By introducing a 
parameter to control the extent of elitism, we study its effect on five complex 
test problems. In all simulation results, it is clear that the controlled NSGA-II 
has a better convergence property than the original NSGA-II with uncontrolled 
elitism. 



2 Elitist Non-dominated Sorting Genetic Algorithm 
(NSGA-II) 

The details of NSGA-II algorithm appear elsewhere P]. Essentially, NSGA-II 
differs from our original NSGA implementation Pj in a number of ways. Firstly, 
NSGA-II uses an elite-preserving mechanism, thereby assuring preservation of 
previously found good solutions. Secondly, NSGA-II uses a fast non-dominated 
sorting procedure. Thirdly, NSGA-II does not require any tunable parameter, 
thereby making the algorithm independent of the user. 

Initially, a random parent population Pq is created. The population is sorted 
based on the non-domination. A special book-keeping procedure is used in order 
to reduce the computational complexity to 0{MN‘^). Each solution is assigned 
a fitness equal to its non-domination level. Binary tournament selection, recom- 
bination, and mutation operators are used to create a child population Qq of 
size N. Thereafter, we use the algorithm shown in Figure^in every generation. 

First, a combined population Rt = Pt U Qt is formed. This allows parent 
solutions to be compared with the child population, thereby ensuring elitism. 
The population Rt is of size 2N. Then, the population Rt is sorted according to 
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Rt — Pt^-i Qt 

T = f ast-non-dominated-sort (i?t) 

Pt+\ = 0 and i = 1 
until |Pt+r| + \Pi\ < N 

Pt+i = Pt+i U Pi 

crowding-distance-assigrmient {Pi) 
i = i + 1 

Sort(:Fi, 

Pt+^ = Pt+iVJ Pi[l : {N-\Pt+i\)] 

Qt+i = make-new-pop(Pt+r) 
t = t+l 



Fig. 1. NSGA-II algorithm is shown. 



non-domination HD! and different non-dominated fronts F\, F 2 , and so on are 
found. The algorithm is illustrated in the following: 

The new parent population Pt+i is formed by adding solutions from the 
first front Fi and continuing to other fronts successively till the size exceeds 
N. Individuals of each front are used to calculate the crowding distance — the 
distance between the neighboring solutions 0. Thereafter, the solutions of the 
last accepted front are sorted according to a crowded comparison criterion and a 
total of N points are picked. Since the diversity among the solutions is important, 
the crowded comparison criterion uses a relation -<n as follows: 



Definition 1 Solution i is better than solution j in relation An 

if (irank P jrank) Or ((irank — jrank) and (idistan ce ^ j distance) )■ 



That is, between two solutions with differing non-domination ranks we prefer 
the point with the lower rank. Otherwise, if both the points belong to the same 
front then we prefer the point which is located in a region with smaller number 
of points (or with larger crowded distance). This way solutions from less dense 
regions in the search space are given importance in deciding which solutions to 
choose from Rt- This constructs the population Pt+i- This population of size N is 
now used for selection, crossover and mutation to create a new population Qt+i of 
size N. We use a binary tournament selection operator but the selection criterion 
is now based on the crowded comparison operator A„. The above procedure is 
continued for a specified number of generations. 

It is clear from the above description that NSGA-II uses (i) a faster non- 
dominated sorting approach, (ii) an elitist strategy, and (ii) no niching param- 
eter. It has been shown elsewhere P] that the above procedure has 0{MN'^) 
computational complexity. 



2.1 Exploitation Versus Exploration Issue 

The above description of NSGA-II raises an important issue relating to EA re- 
search: the issue of exploitation and exploration 0. Let us imagine that at a 
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generation, we have a population Rt where most of the members lie on the non- 
dominated front of rank one and this front is distant from the true Pareto-optimal 
front. This will happen in the case of multi-modal problems, where a population 
can get attracted to a local Pareto-optimal front. Since most members belong to 
the current best non-dominated front, the elitism operation will result in deleting 
most members belonging to other fronts. In the above NSGA-II algorithm, eli- 
tist solutions are emphasized in two occasions. Once in the make-new-pop (Pj+i) 
operation and again during the elitism operation in the until loop. The former 
operation involves the crowded tournament selection operator, which emphasizes 
the elitist solutions (the currently-best non-dominated solutions). In the latter 
case, solutions are selected starting from the currently-best non-dominated so- 
lutions till all population slots are filled. This way, the elitist solutions also get 
emphasized. This dual emphasis of elitist solutions will cause a rapid deletion 
of solutions belonging to non-elitist fronts. Although the crowding tournament 
operator will ensure diversity along the current non-dominated front, lateral di- 
versity will be lost. In many problems, when this happens the search slows down, 
simply because there may be a lack of diversity in certain decision variables left 
to push the search towards better regions of optimality. Thus, in order to ensure 
better convergence, a search algorithm may need diversity in both directions — 
along the Pareto-optimal front and lateral to the Pareto-optimal front, as shown 
in Figure 0 In the test problems suggested elsewhere 0, the lateral diversity is 



^2 Front 




Fig. 2. Controlled elitism procedure is illustrated. 



ensured by the functional g{) with (n— 1) decision variables. Solutions converging 
to any local Pareto-optimal front will make all these decision variables to take 
an identical value. The dual elitism will reduce the variability in these solutions 
and eventually prevent the algorithm to move towards the true Pareto-optimal 
front. 
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The difficulty arises because of the way elitism is implemented in NSGA-II. 
There is no control on the exploitation of the currently best non-dominated so- 
lutions. In the above discussion, it is clear that in certain complex functions, 
NSGA-II, in the absence of a lateral diversity-preserving operator such as muta- 
tion, causes too much exploitation of currently-best non-dominated solutions. In 
order to counteract this excessive selection pressure, an adequate exploration by 
means of the search operators must be used. A proper balance of these two issues 
is not possible to achieve with the uncontrolled elitism mechanism in NSGA-II. 
In an earlier study |21, we have shown that in the test problem ZDT4 with Ras- 
trigin’s multi-modal function as the g functional, NSGA-II could not converge 
to the global Pareto-optimal front. However, when a mutation operator with a 
larger mutation strength is used, NSGA-II succeeds in converging to the global 
Pareto-optimal front. Increasing variability through mutation enhances the ex- 
ploration power of NSGA-II and a balance between enhanced exploitation of 
NSGA-II and the modified exploration can be maintained. 

Although many researchers have adopted an increased exploration require- 
ment by using a large mutation strength in the context of single objective EAs, 
the extent of needed exploration is always problem dependent. In the following 
section, instead of concentrating on changing the search operators, we suggest a 
controlled elitism mechanism which will control the extent of exploitation rather 
than controlling the extent of exploration. 



3 Controlled Approach 



In the proposed controlled NSGA-II, we restrict the number of individuals in 
the currently best non-dominated front adaptively. We attempt to maintain a 
pre-defined distribution of number of individuals in each front. Specifically, we 
use a geometric distribution for this purpose: 

rii = rni-i, ( 1 ) 



where rii is the maximum number of allowed individuals in the i-th front and 
r (< 1) is the reduction rate. Although the parameter r is user-defined, the 
procedure is adaptive as follows. 

First, the population Rt = Pt U Qt is sorted for non-domination. Let us 
say that the number of non-dominated fronts in the population (of size 2N) 
is K. Thus, according to the geometric distribution, the maximum number of 
individual allowed in the z-th front (i = 1,2,..., K) in the new population of 
size N is 



rii = N 



1 — r 

1 



( 2 ) 



Since r < 1, the maximum allowable number of individuals in the first front 
is highest. Thereafter, each front is allowed to have an exponentially reducing 
number of solutions. This exponential distribution considered above is an as- 
sumption and must be tried with other distributions such as an arithmetic dis- 
tribution or a harmonic distribution. But the principle in all these approaches is 
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the same — forcibly allow solutions from all non-dominated fronts to co-exist in 
the population. 

Although equation |2| denotes the maximum allowable number of individuals 
Tii in each front i, in a population, there may not exist exactly rii individuals 
in front i. We resolve this problem by starting a procedure from the first front. 
The number of individuals in the first front is counted. Let us say that there 
are n{ individuals. If n\ > ni (that is, there are more solutions than allowed), 
we only choose rii solutions using the crowded tournament selection. This way, 
exactly rii solutions that are residing in less crowded region are selected. On the 
other hand, if n\ < rii (that is, there are less solutions in the population than 
allowed), we choose all n\ solutions and count the number of remaining slots 
Pi = ni — n\. The maximum allowed number of individuals in the second front 
is now increased to U 2 ri 2 + pi- Thereafter, the actual number of solutions 
n\ present in the second front is counted and is compared with ri 2 as above. 
This procedure is continued till all N individuals are selected. Figure 0 shows 
that a population of size 2N (having four non-dominated fronts with topmost 
subpopulation representing front one and so on) is reduced to a new population 
Pt+i of size N using the above procedure. In the transition shown in the right, 
all four fronts have representative solutions. Besides this elitism procedure the 
rest of the procedure is kept the same as that in NSGA-II. The figure also shows 



Controlled 

NSGAdII NSGAdII 




Fig. 3. Controlled elitism procedure is illustrated. 



the new population Pt+i which would have obtained using the usual NSGA-II 
procedure. It is clear that the new population under the controlled NSGA-II is 
more diverse than that under NSGA-II. 

Since the population is halved, it is likely that in each front there would be 
more solutions than allowed. However, there could be some situations where after 
all 27V solutions are processed as above, there are still some slots left in the new 
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population to be filled. This may happen particularly when r is large. In such 
cases, we make another pass with the left-out individuals from the first front, 
continuing to other fronts, and start including them till we fill up the remaining 
slots. 

3.1 Discussions 

As mentioned earlier, keeping individuals from many non-dominated fronts in 
the population help to the recombination operator to create diverse solutions. 
NSGA-II and many other successful MOEAs thrive at maintaining diversity 
among solutions of independent non-dominated fronts. The controlled elitism 
procedure suggested above will help to maintain diversity in solutions across the 
non-dominated fronts. In solving difficult multi-objective optimization problems, 
this additional feature may be helpful in progressing towards the true Pareto- 
optimal front. 

It is intuitive that the parameter r is important in maintaining the correct 
balance between the exploitation and exploration issue discussed earlier. This 
parameter sets up the extent of exploration allowed in an MOEA. If r is small, 
the extent of exploration is large and vice versa. In general, the optimal value 
of r will depend on the problem and will be difficult to find theoretically. In the 
following section, we present simulation results on a number of difficult problems 
to investigate if there exists any range of r, where NSGA-II performs well. 

4 Simulation Results 

In order to demonstrate the working of the proposed methodology, we use a set 
of test problems. The first four problems have multiple Pareto-optimal fronts, 
among those many fronts, one is the globally Pareto-optimal set. Next, we con- 
sider one constrained test problem, which has infeasible regions restricting the 
search towards the true Pareto-optimal front. 

In order to investigate the effect of the controlled elitism alone, we do not use 
the mutation operator. For controlled NSGA-II runs, we do not use the selection 
operator in make_new_pop() to create the child population. For all problems, 
we use a population size of 100, a crossover probability of 0.95, and spread 
factor for the SBX operator of 20. All simulation runs are continued until 200 
generations are over. 

4.1 Multi-objective Rastrigin’s Problem 

This problem is identical to the ZDT4 function, introduced elsewhere m 



ZDT4 : 



1 



Minimize /i(x) = xi 



Minimize / 2 (x) = ^(x) 1 - y^xi/g{x) 



g{x) = 91-1- ~ 10cos(47ra;j)l 

xie[0,l], Xi G [-5,5], i = 2,...,10. 



(3) 
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It was also observed in another study [2] that NSGA-II has difficulties in con- 
verging to the global Pareto-optimal front. However, when the mutation strength 
is increased, NSGA-II could converge to the correct front. 

Figure0shows that a simulation run with NSGA-II could not converge to the 
true Pareto-optimal front (the plot shown with diamonds). Since mutation is not 





Fig. 4. Distribution of obtained set of so- Fig. 5. Average g{) for controlled and un- 
lutions for ZDT4. controlled NSGA-IIs for ZDT4. 



used, NSGA-II also faced difficulty in maintaining a diverse set of solutions in 
the obtained front. The figure also shows two other simulations with controlled 
elitism (r = 0.2 and r = 0.9), starting from the same initial population. The 
figure clearly shows that when controlled elitism is used, existence of multiple 
fronts ensures better convergence and better diversity among obtained solutions. 
To make a thorough investigation, we have used 19 controlled elitist NSGA-IIs 
in the range r G [0.05,0.9] with an interval of 0.05 and compared the results 
with the original NSGA-II. For each NSGA-II, we have performed 25 simulation 
studies, each starting with a different initial population. But for all NSGA-IIs, 
25 identical populations are used. The average value of the best functional g{) in 
200 generations is calculated and plotted in Figure El Equation 01 reveals that all 
solutions having g = 1 lie in the global Pareto-optimal front. Thus, an MOEA 
with a smaller value of functional g{) can be considered to have done well in 
solving the above problem. The figure clearly shows that all controlled elitism 
runs have done much better than the original NSGA-II. The average g{) value 
is much closer to one than NSGA-II. Although there is some fluctuations in 
the obtained g{) values for different setting of the reduction rate, all controlled 
NSGA-IIs are better. 

In order to investigate the composition of population with controlled elitism, 
we count the number of solutions in the first and the second fronts in 25 differ- 
ent NSGA-II and controlled NSGA-II with r = 0.5 runs. The average of these 
numbers is plotted in Figures 0 and 0 Figure 0shows that in the case of original 
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NSGA-II, the number of solutions is in the first front grows very rapidly after 
a certain number of generations. Within about 90 generations, all population 
members belong to the best non-dominated front. Thereafter, NSGA-II works 
by keeping all population members in the first front. The only way a new solu- 
tion will be accepted, if it lies in the current best non-dominated front and if 
it resides in a less crowded area. On the other hand, with controlled NSGA-II 
(with r — 0.5) the population is never filled with currently-best non-dominated 
solutions alone. With r — 0.5, only about 50% population members belong to 
the best non-dominated front. Moreover, the rate of increase of solutions in the 
best non-dominated front is also small and importantly, the growth of solutions 
in this front begins much later than that with the original NSGA-II. Figure [0 





Fig. 6. Average number of solutions in Fig. 7. Average number of solutions in 
the first (best) non-dominated front. the second non-dominated front. 



shows the number of solutions in the second front. The figure clearly shows that 
in the original NSGA-II, there exists no member in the second front after about 
90 generations, meaning that all solutions belong to the first non-dominated 
front. But, in the case of controlled NSGA-II, about 25% population belongs 
to the second front. The presence of so many solutions in the second front and 
subsequent fronts provides a lateral diversity needed for a better convergence to 
the global Pareto-optimal front. 

We would like to distinguish the original NSGA-II algorithm with controlled 
NSGA-II having r = 0. With r = 0, the allowed number of solutions in the best 
non-dominated front is N (the population size). Thus, there is no pressure to 
maintain other fronts in the population. This algorithm may apparently look 
the same as the original NSGA-II, but there is a difference. In the r = 0 con- 
trolled NSGA-II, the tournament selection in the make_new_pop() operator is 
eliminated. This way, the selection pressure for currently-best elitist solutions 
is smaller than that in the original NSGA-II. Although this simple change in 
NSGA-II exhibits a better performance, the controlled NSGA-IIs with r > 0 are 
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generic. Moreover, NSGA-IIs with some values of r > 0 show better performance 
than NSGA-II with r = 0. It is a matter of future research to investigate if a 
dynamically changing r with generation makes the convergence better than a 
fixed r. However, the effect of r is evident in the following problem. 



4.2 Biased Test Function 



Next, we construct a test problem using equation 0 with the funcitonal g{) as 
given by the following equation |^. 



5 (x) = 1 + 9 



E 30 



30 



^max 
2=2 *^2 



v"^30 



( 4 ) 



where 7 = 0.25. Here, the variables Xi are allowed to vary in the limits [0,1]. 
The function has the property of having more solutions away from the Pareto- 
optimal front, thus making the search more and more difficult to come to the 
Pareto-optimal front. This function has the globally Pareto-optimal front with 
the functional (?(x) = 1 . 

FigureElshows a typical distribution on the Pareto-optimal front for NSGA-II 
runs and different controlled elitist NSGA-II on this function without mutation. 
Here, the weak definition of the dominance is used. It is clear from the Figure 0 




Fig. 8. Distribution of obtained set of so- 
lutions for ZDT4 with a biased function. 
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Fig. 9. Average g{) for controlled and nn- 
controlled NSGA-IIs for ZDT4 with a bi- 
ased function. 



that controlling elitism is useful. The best performance of the MOEA is achieved 
at r = 0.65. The performance of NSGA-II improves with r, but after a critical r 
the performance deteriorates. For small r, not enough fronts exist in the popula- 
tion, thereby slowing the performance. On the other hand, when r is large, not 
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much selection pressure is assigned to the current best non-dominated solutions. 
Figure IHl shows the lateral diversity for a typical run for r = 0, r = 0.65 and 
the original NSGA-II runs. It is clear that diversity on the front is maintained 
in all the fronts. There exist a number of points in the run for r — 0.65 which 
are not on the best non-dominated front. These are the points which exist on 
the different fronts and provide the needed diversity to help proceed NSGA-II 
towards the Pareto-optimal region. The figure confirms the concept of surviving 
different fronts simultaneously to achieve the better convergence. 

4.3 Multi-objective Griewangk Problem 

Next, we construct a problem using equation 0, except that the functional g{) 
is now different (g() is a Greigwank’s function 0): 

10 

g(x) = 2 + Y^ a;2/4000 - cos{xi/^fi). (5) 

i=2 

Here, the variables X 2 until Xio are all allowed to vary in [—512, 511]. This func- 
tion has 163® local Pareto-optimal fronts, of which only one is globally Pareto- 
optimal front having ^(x) = 1. 

Figure E3 shows a typical performance plot of NSGA-II and controlled elitist 
NSGA-IIs on this function without mutation. It is clear that controlled elitism 
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Fig. 10. Distribution of obtained set of 
solutions for ZDT4 with Griewangk’s 
function. 



Fig. 11. Average g{) for controlled and 
uncontrolled NSGA-IIs for ZDT4 with 
Griewangk’s function. 



runs are better converged near the true Pareto-optimal front. Because of lateral 
diversity present in the controlled NSGA-II runs, diversity in the obtained set of 
solutions also comes as a bonus. Figure^ shows a detailed comparative study. 
Once again, the figure confirms that the controlled NSGA-II results are better. 
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4.4 Nonuniform Distribution 



We now use the problem ZDT6, which was used in another study m- 



ZDT6 : 



Minimize /i(x) = 1 — exp(— 4xi) sin®(67ra;i) 
Minimize / 2 (x) = ^(x) 1 - {xi/g(ic)f 

5f(x) = l + 9 (El=2a^i)/9 
a;iG[0,l], Xi G [-5,5], i = 2, . . . , 10. 



( 6 ) 



The above problem should not cause much difficulty in converging to the cor- 
rect front (^(x) = 1), but causes difficulty in maintaining a uniform spread of 
solutions in the front. 

Even in this problem, we observe from Figures 11 21 and FO that the original 
NSGA-II without the help of mutation cannot come closer to the Pareto-optimal 
front. Moreover, the distribution of solutions in the obtained set is also non- 





Fig.l2. Distribution of obtained set of Fig. 13. Average g{) for controlled and 
solutions for ZDT6. uncontrolled NSGA-IIs for ZDT6. 



uniform. Controlled NSGA-IIs performs better than the original NSGA-II in 
both converging near the true Pareto-optimal front and maintaining a uniform- 
like distribution. Figure [O shows that NSGA-II’s crowding mechanism is good 
enough to maintain a good distribution of solutions in this problem. But, NSGA- 
II’s large selection pressure through uncontrolled elitism is detrimental. Once 
again, the performance of controlled NSGA-IIs is more or less insensitive to the 
chosen r value. 
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4.5 Constrained Test Problem (CTP7) 

Finally, we attempt to solve a constrained test problem shown below 0: 

Minimize /i(x) = xi 
Minimize / 2 (x) = ^(x) (l - 
Subject to c(x) = cos( 0 )(/ 2 (x) — e) — sin(0)/i(x) > 

a |sin (& 7 T (sin( 6 »)(/ 2 (x) - e) + cos(0)fi(x)y)f . 

The decision variable x\ is restricted to [0, 1]. The functional g() is the Rastrigin’s 
function used in equation 0 But, here we use five variables, instead of 10 used 
in ZDT4. Other parameter values are as follows: 

0 = — 0.057T, a = 40, 5=5, c=l, d=6, e = 0. 

The feasible search space and the corresponding disconnected Pareto-optimal 
regions are shown in Figure O 



CTP7 : 




Fig. 14. Constrained test problem CTP7. 



In the absence of a mutation operator, we observe that NSGA-IFs strong 
elitism prevents the algorithm to come closer to the true Pareto-optimal front 
(Figure El). The outcome of 25 runs suggests that the controlled elitism can 
overcome this problem somewhat and help NSGA-II to converge closer to the 
true Pareto-optimal front (Figure El- The absence of mutation in all NSGA- 
II runs prohibits them to maintain a diverse set of population members in the 
obtained set. 
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f_l Reduction ratio, r 



Fig. 15. Distribution of obtained set of Fig. 16. Average g{) for controlled and 
solutions for CTP7. uncontrolled NSGA-IIs for GTP7. 



5 Conclusion 

In this paper, we have discussed the need to make a balance between exploitation 
offered by a selection operator along with an elite-preserving mechanism with 
exploration offered by a recombination operator. Although NSGA-II has been 
largely successful in many problems, in some difficult problems, it showed diffi- 
culty in converging to the Pareto optimal front. Here, we have clearly shown that 
the uncontrolled elitism present in NSGA-II produces a large selection pressure 
for currently-best non-dominated solutions. In difficult multi-modal problems 
and other problems, this forces NSGA-II without any mutation operator to have 
a premature convergence to a suboptimal front. 

By forcing a number of non-dominated fronts to co-exist in an NSGA-II, 
we have shown that the selection pressure offered by NSGA-II’s elitism operator 
can be controlled. On five difficult test problems, including a difficult constraint- 
handling problem, we have shown that the controlled NSGA-II has a better 
convergence property than the original NSGA-II. Since diverse non-dominated 
fronts are also maintained in the population, controlled NSGA-II also produces 
a better distribution of solutions than the original NSGA-II. 

The controlled elitism mechanism implemented here with NSGA-II is generic 
and can also be implemented with other elitist MOEAs. It would be a matter of 
future research to investigate whether the convergence property of other MOEAs 
improves with such a controlled elitism operator. 
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Abstract. When we try to implement a multi-objective genetic algorithm 
(MOGA) with variable weights for finding a set of Pareto optimal solutions, 
one difficulty lies in determining appropriate search directions for genetic 
search. In our MOGA, a weight value for each objective in a scalar fitness 
function was randomly specified. Based on the fitness function with the 
randomly specified weight values, a pair of parent solutions are selected for 
generating a new solution by genetic operations. In order to find a variety of 
Pareto optimal solutions of a multi-objective optimization problem, weight 
vectors should be distributed uniformly on the Pareto optimal surface. In this 
paper, we propose a proportional weight specification method for our MOGA 
and its variants. We apply the proposed weight specification method to our 
MOGA and a cellular MOGA for examining its effect on their search ability. 



1 Introduction 

Genetic algorithms have been successfully applied to various optimization problems 
[1]. The extension of GAs to multi-objective optimization was proposed in several 
manners (for example, Schaffer [2], Kursawe [3], Horn et al. [4], Fonseca & Fleming 
[5], Murata & Ishibuchi [6], Zitzler & Thiele [7]). The aim of these algorithms is to 
find a set of Pareto-optimal solutions of a multi-objective optimization problem. 
Another issue in multi-objective optimization is to select a single final solution from 
Pareto-optimal solutions. Many studies on multi-objective GAs did not address this 
issue because the selection totally depends on the decision maker’s preference. In this 
paper, we also concentrate our attention on the search for finding a set of Pareto- 
optimal solutions. 

In this paper, we try to improve the search ability of our multi-objective genetic 
algorithm (MOGA) in [6] and its variants (i.e., extensions of our MOGA). Fig. 1 
shows some extended algorithms in our previous studies [8-10]. By hybridizing our 
MOGA with a local search procedure, we have already extended it to a multi- 
objective genetic local search algorithm (MOGLS [8]). We have also extended our 
MOGA to a cellular multi-objective genetic algorithm (C-MOGA [9]) by introducing 
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a cellular structure. We have employed a local search procedure and a cellular 
structure in a cellular MOGLS [10]. Furthermore we have extended the cellular 
algorithms by introducing a relocation procedure (i.e., a kind of immigration) in [10]. 
Each individual is relocated to a cell at every generation based on the values of 
multiple objectives (i.e., the location in the multi-dimensional objective space). Those 
extended algorithms, which are based on the cellular structure and the immigration 
procedure, are referred to as Cellular Immigrative (“CI-”) algorithms in Fig. 1. 



Local Search 



I MOCA 

Cellular structure 



Immigration 

I CI-MOCA 



C-MOCA 

I 



MOCLS 



I 

C-MOCLS 



I 

CI-MOCLS 



Fig. 1. Extensions of our MOGA by introducing local search, cellular structures and 
immigration procedures 



When we try to implement our MOGA and its variants with variable weights, one 
difficulty lies in determining appropriate search directions for genetic search. In those 
algorithms, the weight value for each objective in a scalar fitness function is randomly 
specified. Based on the scalar fitness function with the randomly specified weight 
values, a pair of parent solutions are selected for generating a new solution by genetic 
operations. In order to find a variety of Pareto optimal solutions of a multi-objective 
optimization problem, a proportional weight specification method is more desirable 
than the random specification method. In this paper, we propose a proportional weight 
specification method for our MOGA and its variants. We apply the proposed weight 
specification method to our MOGA and a cellular MOGA for examining its effect on 
their search ability. 

The concept of cellular genetic algorithms was proposed by Whitley [11]. In 
cellular genetic algorithms, each individual (i.e. a chromosome) resides in a cell of a 
spatially structured space. Genetic operations for generating new individuals are 
locally performed in the neighborhood of each cell. While the term “cellular genetic 
algorithm” was introduced by Whitley, such algorithms had already been proposed by 
Manderik & Spiessens [12]. A similar concept was also studied in evolutionary 
ecology in the framework of “structured demes” (Wilson [13], Dugatkin & 
Mesterton-Gibbons [14]). The effect of spatial structures on the evolution of 
cooperative behavior has also been examined in many studies (e.g., Nowak & May 
[15], Wilson et al. [16], Oliphant [17], Grim [18], and Ishibuchi et al. [19]) where 
each individual was located in a cell of single-dimensional or two-dimensional grid- 
worlds. The concept for generating a grid world on an n-objective space is also 
employed in the Pareto Archived Evolution Strategy (PAES) [20], where each 
individual is located in a grid on the objective space. The PAES employs a grid world 
in order to avoid introducing a niche size in the algorithm. 
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2 Multi-objective Optimization 

Let us consider the following multi-objective optimization problem with n objectives: 

Maximize /i(x) , / 2 (x) , /„(x) , (1) 

where /](•), fii') ^ fn(') n objectives. When the following inequalities 

hold between two solutions x and y, the solution y is said to dominate the solution x: 

V/ : fi (x) < fi (y) and 3j : fj (x) < fj (y) . (2) 

If a solution is not dominated by any other solutions of the multi-objective 
optimization problem, that solution is said to be a Pareto-optimal solution. The task of 
multi-objective algorithms in this paper is not to select a single final solution but to 
find all Pareto-optimal solutions of the multi-objective optimization problem in (1). 
When we use heuristic search algorithms such as taboo search, simulated annealing, 
and genetic algorithms for finding Pareto- optimal solutions, we usually can not 
confirm the optimality of obtained solutions. We only know that each of the obtained 
solutions is not dominated by any other solutions examined during the execution of 
those algorithms. Therefore obtained solutions by heuristic algorithms are referred to 
as “nondominated” solutions. For a large-scale multi-objective optimization problem, 
it is impossible to find all Pareto-optimal solutions. Thus our task is to find many 
near-optimal nondominated solutions in a practically acceptable computational time. 
The performance of different multi-objective algorithms is compared based on several 
quality measures of obtained nondominated solutions. 

3 Multi-objective Genetic Algorithms (MOGA) 

In this section, we explain our MOGA [6], which is the basic algorithm of the C- 
MOGA (See Fig. 1). In our MOGA, the weighted sum of the n objectives is used as a 
fitness function: 

fix) = Wi/i (x) -I- h- 2/2 (x) -I- ... -I- w„/„ (x) , (3) 

where h’i,...,w„ are nonnegative weights for the n objectives, which satisfy the 
following relations: 

w; > 0 for i = , (4) 

W[ + W2 + — I" Wyi = 1 . (5) 

This fitness function is utilized when a pair of parent solutions are selected for 
generating a new solution by crossover and mutation. One characteristic feature of our 
MOGA is to randomly specify weight values whenever a pair of parent solutions are 
selected. That is, each selection (i.e., the selection of two parents) is performed based 
on a different weight vector. This means that each of newly generated solutions by the 
genetic operations has its own weight vector. The other characteristic feature of our 
MOGA is preserving all nondominated solutions which are obtained during the 
execution of the algorithm. We describe these characteristic features in the following 
subsections. 
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3.1 Selection Operation 

When a pair of parent solutions are to be selected from a current population in a 
selection operation for generating an offspring by genetic operations, first the n 
weight values ( wj , > 1 ^ 2 , . . . , ) are randomly specified as follows: 

Wi = randorrii /{random^ H 1- random^ ) , / = 1, 2, . . . , n , (6) 

where randonij is a nonnegative random real number. For example, when N pairs 
of parent solutions are selected for generating a new population, N different weight 
vectors are specified by (6). This means that N search directions are utilized in a 
single generation. In other words, each selection (i.e., the selection of two parents) is 
governed by a different fitness function. 

3.2 Elitist Strategy 

Our MOGA separately stores two different sets of solutions: a current population and 
a tentative set of nondominated solutions. After genetic operations are applied to the 
current population, it is replaced with newly generated solutions. At the same time, 
the tentative set of nondominated solutions is updated. That is, if a newly generated 
solution is not dominated by any other solutions in the current population and the 
tentative set of nondominated solutions, this solution is added to the tentative set. 
Then all solutions dominated by the added one are removed from the tentative set. In 
this manner, the tentative set of nondominated solutions is updated at every 
generation in our MOGA. 

From the tentative set of nondominated solutions, a few solutions are randomly 
selected and added to the current population (see Fig. 2). The randomly selected 
nondominated solutions may be viewed as elite solutions because they are added to 
the current population with no modification. 

When a multi-objective optimization problem has a non-convex Pareto front, 
weighted sum approaches with constant weights fail to find its entire Pareto solutions. 
This is because those algorithms try to find a single optimal solution with respect to 
the fixed weights by their single trial. Our approach remedies such a difficulty by 
using variable weights and storing the tentative set of nondominated solutions. This 
set is updated by examining the Pareto optimality of every solution generated by 
genetic operations during the execution of the algorithm. It was shown in [6, 8] that 
our approach found nondominated solutions on a non-convex Pareto front of a two- 
objective continuous optimization problem. In [6, 8], our approach was also applied to 
two-objective flowshop scheduling problems with non-convex Pareto fronts. 

3.3 Algorithm 

Let us denote the population size by Apgp . We also denote the number of 
nondominated solutions added to the current population by AlgUte (i-e., Agiite is the 
number of elite solutions, see Fig. 2). Using these notations, our MOGA can be 
written as follows. 
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Step 0) Initialization: Randomly generate an initial population of A^pop solutions. 

Step 1) Evaluation: Calculate the values of the n objectives for each solution in the 
current population. Then update the tentative set of nondominated solutions. 
Step 2) Selection: Repeat the following procedures to select ( A^pop - A^elite ) pairs of 
parent solutions. 

a) Randomly specify the weight values wj , W 2 ,...,h’„ in the fitness 
function (3) by (6). 

b) According to the following selection probability P(x) , select a pair of 
parent solutions from the current population 'P . 



ye'b 



(7) 



where /min('P) is the minimum fitness value in the current population 'P . 

Step 3) Crossover and Mutation: Apply a crossover operator to each of the selected 
( i'^pop - At elite ) pairs of parent solutions. A new solution is generated from 

each pair of parent solutions. Then apply a mutation operator to the 
generated new solutions. 

Step 4) Elitist Strategy: Randomly select At elite solutions from the tentative set of 
nondominated solutions, and add the selected AtgUte solutions to the 
( iVpop - At elite ) solutions generated in Step 3 to construct a population of 
At pop solutions. 

Step 5) Termination Test: If a prespecified stopping condition is satisfied, end the 
algorithm. Otherwise, return to Step 1. 



Current Next 

population population 




Non-dominated Non-dominated 

solutions solutions 



Fig. 2. Illustration of the elitist strategy in the MOGA 
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4 Cellular Algorithms 

4.1 Relation between Cell Location and Weight Vector 

In cellular algorithms, each individual (i.e. a solution) resides in a cell in a spatially 
structured space (e.g., two-dimensional grid-world). For utilizing a cellular structure 
in our MOGA, we assign a different weight vector to each cell. For our n-objective 
optimization problem, cells are structured in an n-objective weight space. Fig. 3 
shows an example of structured cells for a two-dimensional optimization problem 
where the two weights W] and W 2 are used for the calculation of the fitness function 
/(x) as /(x) = Wj/j(x) -h W 2 / 2 (x) . In this figure, the population size is eleven 
because an individual exists in each cell. As shown in Fig. 3, the location of each cell 
corresponds to its weight vector. In order to allocate cells on uniformly distributed 
weight vectors, we generate weight vectors systematically (not randomly). For 
example, weight vectors in Fig. 3 are (1.0, 0.0), (0.9, 0.1), ..., (0.0, 1.0). 

As shown in Fig. 3, we can easily generate uniform weight vectors on the two- 
dimensional weight space. In order to generate uniformly distributed weight vectors 
for multi-objective optimization problems with three or more objectives, we propose a 
weight specification method on an n-dimensional grid world. Let us consider weight 
vectors satisfying the following conditions. 



W2 




Fig. 3. Location of each cell in the two-dimensional weight space 

These conditions show that weight vectors are generated by combining n non- 
negative integers with the sum of d. In our cellular algorithm, a cell is located on 
every weight vector satisfying the above conditions. Thus the number of cells (i.e., the 
population size) is equal to the total number of weight vectors satisfying the above 
conditions. This means that the population size is determined by d. For example, 
when we specify d as d = 10 in (8) for the case of two-objective problems, we will 
have eleven weight vectors (10, 0), (9, 1), ..., (0, 10). Each of these weight vectors has 
the same direction as the corresponding weight vector in Fig. 3. 
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Fig. 4. Location of each cell in the three-dimensional weight space by the proposed method 



Wj -t- >1^2 H \-w^=d , (8) 

W;G {0,1,2, (9) 



This weight specification method is easily extended to the case with three or 
more objectives. For example. Fig. 4 shows an example of the three-objective case 
where d is specified as d = A . From Fig. 4, we can observe that the value of d can be 
considered as the number of partitions of the edge between two extreme points (e.g., 
(0,4,0) and (4,0,0)). By this weight specification method, we can uniformly distribute 
cells on the n-dimensional space. We can calculate the number of cells generated for 
n-objective problems as follows: 



N2(d) - 


d + \--0{d). 




(10) 


N^{d) = 


lioN2(i) = l 


^toO' + l) = 0 + l)0' + 2)/2»0(j2), 


(11) 


N4(d) = 


lf=0^3(0 = I 


;f=o(' + lX' + 2)/2»0(uf3), 


(12) 


N„{d) = 




(9(r/”“^), 


(13) 



where N j{d) , j =2,...,n are the number of generated cells for j-objective 
problems. We can see from the above equations that the number of cells can be 

calculated recursively. We also see that the order of the number of cells is for n- 
objective problems. Since the number of cells is determined from the value of d, the 
population size of our cellular algorithm can be specified by d. In other words, we 
should specify the value of d according to an appropriate population size. 
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4.2 Definition of Neighborhood 

We can arbitrary define a neighborhood structure among cells. That is, we can utilize 
any distance between cells in the «-dimensional space in which cells are structured. 
For example, the Euclid distance can be used for measuring the distance between 
cells. In this paper, we use the Manhattan distance. That is, we define the distance 
between a cell with the weight vector w = (h’ 2 ,W 2 ,...,h’„) and another cell with 
V = (vi,V 2 ,...,v„) as follows: 

n 

Distance{v/ , v) = X|w,--v,|. (14) 

i=i 

We define the neighborhood of the weight vector w as 

N(yi) = {y\Distance{v/,y)<D}. (15) 

For example, when D = 2 in Fig. 5, the cell with the weight vector (2,1,1) has six 
neighboring cells (i.e., shaded cells, (1,2,1), (1,1,2), (2,0,2), (3,0,1), (3,1,0), and 
(2,2,0) in Fig. 4) and that cell itself in its neighborhood. As shown in this example, the 
neighborhood of each cell is defined by its nearby cells within the distance D 
including that cell itself. 

4.3 Selection 

Two parents for generating a new individual in a cell are selected from its 
neighborhood. When D = 2 in Fig. 5, the parent solutions for the cell on (2,1,1) can 
be selected from that cell and its six neighbors. It is noted that the fitness value of 
each neighbor is recalculated based on the weight vector assigned to the cell for 
which a new individual is generated. That is, each individual is differently evaluated 
by this recalculation procedure of the fitness function in the selection for each cell. 
This corresponds to the selection procedure of our original MOGA where the 
selection of each pair of parents was governed hy a different weight vector (see Step 2 
in Subsection 3.3). 

It is noted that the modification of the normalization condition from (5) to (8) 
has no effect on the selection procedure (Step 2 (a) in Subsection 3.3). Let w' , which 
satisfies the normalization condition (8), be a weight vector generated by the proposed 
weight specification method. A weight vector w = (wj,h’ 2 ,...,h’„) satisfying the 
normalization condition (5) can be easily obtained from the relation 
yy' = d -yy = (dwi,dw 2 ,---,dw„) . Let us denote our scalar fitness function with the 
weight vectors w and w' by /(x,w) and f(x,yy') , respectively, for a solution x in 
the current population. Since our scalar fitness function in linear with respect to 
weight values, we have f (x,yy') = d ■ f (x,yy) and /j^i„('P,w') = • /min('P,w) in 

the selection procedure defined hy (7). Thus the same selection probability is obtained 
from w and w' for each solution x in the current population 'P . This means that the 
modification of the normalization condition from (5) to (8) has no effect on the 
selection procedure. 
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4.4 Other Genetic Opearations 

In the previous Subsections 4.1 to 4.3, we show the characteristic features in our C- 
MOGA. As for other genetic operations such as crossover, mutation, and elite 
preserve strategy, we can employ the same operations which can be used in the 
MOGA. That is, the same crossover and mutation operations can be employed for the 
C-MOGA. As shown in Fig. 2, some solutions are selected from the tentative set of 
nondominated solutions, and add them as elite solutions to the current population 
randomly. 

5 Computer Simulations 

5.1 Test Problems 

We applied the C-MOGA with the proposed weight specification method to flowshop 
scheduling problems. Flowshop scheduling is one of the most well-known scheduling 
problems. Since Johnson’s work [21], various scheduling criteria have been 
considered. Among them are makespan, maximum tardiness, total tardiness, and total 
flowtime. Several researchers extended single-objective flowshop scheduling 
problems to multi-objective problems (see, for example, Daniels & Chambers [22]). 

In this paper, we use the makespan and the total tardiness as two scheduling 
criteria in our two-objective flowshop scheduling problems. The makespan is the 
maximum completion time of all jobs to be processed. The total tardiness is the total 
overdue of all jobs. We also employ the total flowtime together with these two criteria 
in our three-objective flowshop scheduling problems. The total flowtime is the total 
completion time of all jobs. Let .g](x) , g 2 (x) , and g^(x) be the makespan, the total 
tardiness, and the total flowtime, respectively. Since these scheduling criteria are to be 
minimized, we specify the three objectives /](x), / 2 (x) and /^(x) of our flowshop 

scheduling as /i(x)= -.gi(x), / 2 (x) = -g 2 (x) and /jCx) = -^ 3 (x) . 

Since flowshop scheduling is to find a job permutation that optimizes the given 
objectives, a sequence of jobs is handled as an individual (i.e., as a string) in our 
algorithm. As test problems, we generated ten 20-job and 10-machine flowshop 
scheduling problems with two and three objectives. The processing time of each job 
on each machine was specified as a random integer in the interval [1, 99], and the 
duedate of each job was defined randomly. Our task is to find a set of Pareto-optimal 
solutions of each test problem. 



5.2 Quality Measures of Solution Sets 

Since multi-objective algorithms find a set of nondominated solutions with respect to 
multiple objectives (not a single final solution with respect to a single objective), the 
comparison between different multi-objective algorithms is not easy. For this purpose, 
we use the following measures for evaluating the quality of a solution set obtained by 
each algorithm. 




Cellular Multi-objective Genetic Algorithms 91 



1) The number of obtained nondominated solutions 

The number of nondominated solutions obtained by each algorithm is a measure to 
evaluate the variety of the solution set. 

2) The number of solutions that are not dominated by other solution sets 

For comparing different solution sets with one another, we examine whether each 
solution is dominated by any other solutions in other sets. If a solution is dominated 
by another solution, we remove that solution. In this manner, we remove solutions 
dominated by other solution sets. The number of remaining solutions in each solution 
set is a measure for evaluating its relative quality with respect to the other solution 
sets. 

3) Set quality measure proposed by Esbensen 

Esbensen [23] proposed an evaluation method of the quality of a solution set. Let us 
denote a solution set by Q. . The best solution x * for a given weight vector 
w = (vvi, W 2 ,...,w„) can be chosen from Q for the n-objective optimization problem 
as follows: 

/ (x*) = Wi/i (x*) -I- W2/2 (X*) + • • ■ + w„/„ (x*) 

= max{ Wi/i (x) -h h- 2/2 (x) -h • • ■ -h (x) I X G £2} . (16) 

Esbensen [23] proposed an idea of measuring the quality of the solution set Q. 
by calculating the expected value of /(x*) over possible weight vectors. In this 
paper, we calculate the expected value of /(x*) by randomly generating 10,000 
weight vectors by (6). That is, the quality of the solution set Q is calculated as 
follows: 

1 10000 

^ max(wJ/i(x) + w2/2(x)-t----i-w;,/„(x)|xG£2], (17) 

lUuuU 

where q(Q.) is the quality of the solution set Q and w' = (wj,W 2 ,...,wJj) , 
i = 1, 2 ,..., 10000 are randomly specified weight vectors. 

5.3 Simulation Results on Two-Objective Flowsbop Scheduling Problems 

In our computer simulations, we applied three algorithms (i.e., the MOGA with 
random weights, the MOGA with weights generated by the proposed method, and the 
C-MOGA) to test problems in order to show the effectiveness of the weight 
specification method and compare the search ability of the C-MOGA with that of the 
MOGAs. We employed the following parameter specifications: 

Crossover: Two-point order crossover (crossover rate: 0.8), 

Mutation: Shift mutation (mutation rate: 0.3), 

Number of elite solutions: Agiite = 3 , 

Neighborhood structure for the local search: Shift, 

Stopping condition: Examination of 50,000 solutions. 
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We used the above stopping condition in order to compare the three algorithms 
under the same computation load. In a single trial of each algorithm, 50,000 solutions 
were examined. The parameter d, which determines the population size, was specified 
as = 100 for two-objective problems. This means that the population size was 101 
(see the equation (10)). The weight vectors of 101 cells were (wj, W 2 ) =(100, 0), (99, 
1), ..., (0, 100). For the C-MOGA, we specified the value of Z) as D = 20 . Therefore 
parent solutions for each cell are selected from neighboring cells within the 
Manhattan distance 20. 

We examined the effect of the introduction of the weight specification method and 
the cellular structure (i.e., the locally restricted genetic operations) in this experiment. 
We obtained a set of nondominated solutions by each algorithm. In Table 1, we 
summarize the average results over 100 trials for each algorithm (i.e. 10 trials for each 
of 10 test problems). In this table, “A” is the number of nondominated solutions 
obtained by each algorithm, and “B” is the number of solutions that are not dominated 
by other solutions obtained by the other algorithm. The ratio of these two numbers is 
shown in the column of B/A. “Quality” is the set quality measure of Esbensen, and 
“SD of Q” shows the standard deviation of the value of Quality. In the calculation of 
“SD of Q”, we averaged the standard deviation for each of ten test problems. 

From Table 1, we can see that most solutions obtained by the MOGA with random 
weights are dominated by solutions obtained by the MOGA with the proposed method 
or the C-MOGA. Thus we can conclude that the weight specification method 
proposed in this paper is effective in the MOGA and the C-MOGA. We can also 
observe that the average and the standard deviation of the Quality value for the C- 
MOGA are better than those for the MOGAs. 

Next, we examined the specification of the parameter D in the C-MOGA. Table 2 
shows the average results over 100 trials for each specification of the neighborhood 
structure (i.e., each specification of D) in the C-MOGA. In this table, the C-MOGA 
with D = 200 is the MOGA with the proposed weight specification method. When 
D = 200 , all solutions in the current population are considered as neighbors of every 
cell in the selection procedure. From Table 2, we can observe that the restriction of 
the genetic operations within some neighboring cells is effective for improving the 
ability to find good sets of nondominated solutions. 

Table 1. Comparison of MOGA with C-MOGA (Two-Objective) 





A 


B 


B/A 


Quality 


SDofQ 


MOGA (random) 


14.6 


2.8 


0.200 


-1063.4 


48.4 


MOGA (proposed) 


17.1 


10.8 


0.648 


-967.4 


15.2 


C-MOGA ( Z) = 20 ) 


17.5 


9.3 


0.536 


-963.6 


10.6 



A: The number of nondominated solutions of the method. 

S: The number of nondominated solutions that are not dominated by those obtained by the 
other method. 

Quality: Set quality measure of Esbensen. 

SD of Q: Standard deviation of Quality. 
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Table 2. Effect of the choice of D in C-MOGA (Two-Objective) 



D 


4 


10 


20 


50 


200 


Quality 


-976.4 


-968.6 


-963.6 


-966.0 


-967.4 


SDofQ 


17.9 


14.3 


10.6 


12.9 


15.2 



Table 3. Comparison of MOGA with C-MOGA (Three-Objective) 





A 


B 


B/A 


Quality SDofQ 


MOGA 


44.8 


8.4 


0.190 


-8004.7 


65.54 


C-MOGA 


61.2 


63.8 


0.966 


-7850.9 


42.13 



Table 4. Effect of the choice of d in C-MOGA (Three-Objective) 



D 


10 


11 


12 


13 


14 


Poplation Size 


66 


78 


81 


105 


120 


# of Generations 


758 


642 


618 


477 


417 



-7871.0 -7866.7 -7852.9 -7850.9 -7864.7 



5.4 Simulation Results on Three-Objective Flowshop Scheduling Problems 

We also applied the C-MOGA and the MOGA with random weights to three- 
objective test problems. We used the same parameter specifications as in the previous 
subsection except for the population size. Since we defined t/ = 13 for the C-MOGA, 
the population size was 105 from the equation (11). In order to compare the two 
algorithms under the same computation load, we specified the population size in the 
MOGA as 105. Simulation results are summarized in Table 3 and Table 4. From 
Table 3, we can see that the performance of the C-MOGA is better than that of the 
MOGA. Table 4 shows the effect of the choice of a value of d on the performance of 
the C-MOGA. The second row shows the population size calculated from the value of 
d. It is noted that each algorithm with a different value of d was terminated when the 
number of examined solutions exceeded the termination condition. The number of 
generations is shown in the third row of Table 4 for each specification of d. From 
Table 4, we can see that the best quality value was obtained by the C-MOGA with 
d = \3 . 
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6 Conclusion 

In this paper, we proposed a weight specification method for the cellular multi- 
objective genetic algorithm (C-MOGA), which is an extension of a multi-objective 
genetic algorithm (MOGA) in our former study (Murata & Ishibuchi [6]). In the 
proposed C-MOGA, each individual is located in a cell with a different weight vector. 
This weight vector governs the selection operation. The selection is performed in the 
neighborhood of each cell. The effectiveness of the C-MOGA with the proposed 
weight specification method was demonstrated by computer simulations on two- and 
three-objective flowshop scheduling problems. 
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Abstract. The conventional weighted aggregation method is extended 
to realize multi-objective optimization. The basic idea is that systemat- 
ically changing the weights during evolution will lead the population to 
the Pareto front. Two possible methods are investigated. One method is 
to assign a uniformly distributed random weight to each individual in the 
population in each generation. The other method is to change the weight 
periodically with the process of the evolution. We found in both cases 
that the population is able to approach the Pareto front, although it 
will not keep all the found Pareto solutions in the population. Therefore, 
an archive of non-dominated solutions is maintained. Case studies are 
carried out on some of the test functions used in 0 and |2]. Simulation 
results show that the proposed approaches are simple and effective. 



1 Introduction 



A large number of evolutionary multiobjective algorithms (EMOA) have been 
proposed m So far, there are three main approaches to evolutionary multi- 
objective optimization, namely, aggregation approaches, population-based non- 
Pareto approaches and Pareto-based approaches |J. In the recent years, the 
Pareto-based approaches have gained increasing attention in the evolutionary 
computation community and several successful algorithms have been proposed 
0 . 



Despite their weaknesses, the aggregation approaches are very easy to im- 
plement and computationally efficient. Usually, aggregation approaches can pro- 
vide only one Pareto solution if the weights are fixed using problem-specific 
prior knowledge. However, it is also possible to find more than one Pareto so- 
lution using this method by changing the weights during optimization. In |0|, 
the weights of the different objectives are encoded in the chromosome to obtain 
more than one Pareto solution. Phenotypic fitness sharing is used to keep the 
diversity of the weight combinations and mating restrictions are required so that 
the algorithm can work properly. 

Most of the EMOAs are based on Genetic Algorithms and relatively little 
attention has been paid to evolution strategies. Some exceptions are ['.^lYliSiqj . In 
0, average ranking is used to dictate the deletion of a fraction of the population. 



E. Zitzler et al. (Eds.): EMO 2001, LNCS 1993, pp. 90- 177?! 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 
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A predator-prey-model is proposed in [0|. A selection method that is similar to 
the VEGA approach m is adopted in |H| . An algorithm called Pareto Archived 
Evolution Strategy (PAES) is suggested in |2I, in which a non-Pareto approach 
together with an archive of the found Pareto solutions are used. 

This paper investigates two methods using the aggregation-based approach. 
To approximate the Pareto front instead of a certain Pareto solution, the weight 
for each objective should be changed systematically. One method is to dis- 
tribute the weights uniformly among the individuals in the population. The 
other method is to periodically change the weights with the process of the evo- 
lution. Although these methods seem to be very simple, we will show that they 
work effectively for two objective optimization problems. Simulations are carried 
out on different test functions studied in m Different evolution strategies, in- 
cluding the standard evolution strategy cn, the Evolution Strategy with Rota- 
tion Matrix Adaptation El and the Evolution Strategy with Covariance Matrix 
Adaptation |1 2|1 Mj are employed. 

What is quite surprising from our simulation results is that our algorithms 
work well even for problems with a concave Pareto front (see Section 4 for 
details), which is usually thought to be not obtainable by aggregation based 
methods Our preliminary explanation is that if the search algorithm goes 
through the concave region of the Pareto front (which is locally near-optimal 
when the objectives are aggregated into one single objective function) and if the 
near optimal solutions are archived, then the Pareto solutions within the concave 
region can also be found using the aggregation method. Further results on this 
issue will be reported elsewhere. 

2 The Aggregation Based Multiobjective Algorithms 

2.1 The Evolution Strategies 

In the standard Evolution Strategy (ES), the mutation of the objective parame- 
ters is carried out by adding an A^(0, af) distributed random number. The step 
sizes tJi are also encoded in the genotype and subject to mutations. A standard 
Evolution Strategy can be described as follows: 

x(t) = x(t - 1) -I- z (1) 

= ai{t - l)exp(r'2:)exp(Tzd; * = 1 , n ( 2 ) 

where x is an n-dimensional parameter vector to be optimized, z is an n- 
dimensional random number vector with z ~ IV(0, cr{t)^), z and Zi are normally 
distributed random numbers with z,Zi ~ A^(0, 1). Parameters r, t' and ai are 
the strategy parameters, where ai is mutated as in equation(2) and r, r' are 
constants as follows: 




-1 



T = 



-1 



(3) 
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There are several extensions to the above standard ES. In our simulations, 
an ES with Rotation Matrix Adaptation and an ES with Covariance Matrix 
Adaptation as well as the standard ES are used to investigate the effectiveness of 
the proposed multiobjective algorithms using different search strategies. For the 
detailed description of the evolution strategies, please refer to pea respectively. 

Two main different selection schemes are used in evolution strategies. Sup- 
pose there are and A individuals in the parent and offspring population, usually 
H < X. One method is to select the /i parent individuals only from the A offspring, 
which is usually noted as (/r,A)-ES. If the fi parent individuals are selected from 
a combination of the /r parent individuals and the A offspring individuals, the 
algorithm is noted as {fi + A)-ES. In our study, the {fj,, A)-ES is adopted. 

2.2 Random Distribution of Weights within a Population 

For the sake of clarity, we consider the two objective problems in the current dis- 
cussion; the extension to problems with more than two objectives is straightfor- 
ward. For a conventional aggregation method, the fitness function is the weighted 
sum of the two different objectives /i and f 2 - 



where wi and W 2 are two constants determined using a prior knowledge about 
the problem. It is clear that by using a pair of fixed weights, only one Pareto 
solution can be obtained. 

Imagine that we run the algorithm so many times that every weight combi- 
nation has been used. In this way, we can obtain all Pareto solutions that the 
Pareto front consists of. Notice, that it has been argued that the Pareto solu- 
tions locating in the concave region of the Pareto front cannot be obtained by 
aggregation methods. However, in the experiments in Section 4, we found that 
our algorithms are successful in obtaining a very complete concave Pareto front 
for low dimensional problems (e.g. n = 2), and a quite complete concave Pareto 
front with a dimension as high as 10. 

Of course, it is unpractical, if not impossible to run the evolutionary algo- 
rithm so many times to exhaust all the weight combinations. Since we are using 
evolutionary optimization, it is natural to take advantage of the population for 
this purpose. If the different weight combinations can be distributed among the 
individuals, the population may be able to approach the Pareto front during 
the process of evolution. Suppose we use the (^, A)-ES, then the weight combi- 
nations can be distributed uniformly among the A individuals in the offspring 
population. Let 



where i = 1,2,..., A and t is the index for generation number. The function 
random{\) generates a uniformly distributed random number between 0 and A. 



Fitness = wifi + W 2 / 2 , 



( 4 ) 



wl(t) = random(X) / X, 
W2(t) = 1-0 - 



( 5 ) 

( 6 ) 
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Generation t 



Generation t+1 



Evolutionary 

Optimization 










Fig. 1. Determination of the weights: Random distribution. 



In this way, we can get a uniformly distributed random weight combination 
W 2 ) among the individuals, where 0 < w\,W 2 < ^ and w\ + W 2 = 1, refer also 
to Fig.^ in which C/(0, 1) denotes a uniform distribution. Notice that the weight 
combinations are regenerated in every generation. 



2.3 Generation-Based Periodical Variation of the Weights 

The idea of a uniformly distributed weight combination can straightforwardly 
be extended to a generation based approach. However, if we still use a random 
weight combination, convergence of the evolutionary algorithm will be in ques- 
tion. Therefore, instead of using a randomly distributed weight combination, we 
use a weight combination that is changed gradually and periodically with the 
process of the evolution. In this work, this is realized as follows: 

wi{t) = |sin(27rt)/P|, (7) 

W 2 {t) = 1.0 - wi{t), (8) 

where t is the number of generation and | • | gives the absolute value. We can 
see from equation (7) that Wi{t) changes from 0 to 1 periodically. The change 
frequency can be adjusted by F. In our study, we set F = 400, which means 
that in every 400 generations, wi will change from 0 to 1 and then from 1 to 0 
four times. FigO shows an example of how the weights change during evolution 
within 200 generations. We found that the results of the algorithm are not very 
sensitive to F, although it seems reasonable to let the weight change from 0 to 1 
twice. Notice in this case, all the individuals have the same weight combination 
in the same generation. 
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Fig. 2. Determination of weights: Generation-based periodical variation, all individuals 
have the same weight combination within a generation. 



2.4 An Archive of Pareto Solutions 

In our algorithm, the population is not able to keep all the found Pareto solutions, 
although it is able to approach the Pareto front dynamically. Therefore, it is 
necessary to record the Pareto solutions that have been found so far. The idea of 
building a Pareto archive is inspired from |2j, although we use quite a different 
method to aggregate and maintain the archive. The pseudo-code for building 
the archive is listed in Algorithm ^ The similarity is measured by the Euclidean 
distance in the fitness space. It should be noticed that it is possible for one 
solution in the archive is dominated by another. 

3 Test Functions 

To evaluate the effectiveness of the proposed algorithms, simulations are carried 
out on four test functions used in iro 



The first test function (Fi) used here is the second function in 
extend it to an n-dimensional function: 


|E] and we 


1 " 

/i = - ^ 
2 = 1 


(9) 


1 " 

h = - “ 2.0)^ 

i— 1 


(10) 


The second test function (Fj) is the first function in which has a convex 

Pareto front: 


fl = Xi 


(11) 


g{x 2 ,...,x„) = l.0+ 


(12) 


/a = g X (1.0- \/fi/g) 


(13) 



where Xi G [0, 1]. 
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for each individual o in offspring population do 

if (o dominates an individual in parent population p) and (o is not dominated 
by any solutions in the archive) and (o is not similar to any solutions in the 
archive) then 

if archive is not full then 
add o to the archive 

else if o dominates any solution a in the archive then 
replace a with o 

else if any solution ai in the archive dominates another solution 02 then 
replace 02 with o 
else 

discard o 

end if 
else 

discard o 

end if 
end for 

for each solution in the archive do 
if solution ai dominates 02 then 
remove 02 

end if 
end for 

Algorithm 1: Pseudo-code for maintaining an archive of Pareto solutions. 



— The third test function (T3) is the second function in which has a concave 
Pareto front: 



h 

g{x2, ■■.,Xn) 

h 



= Xi 


(14) 


Q ” 

= 1-0+ '^,Xj 

n — 1 


(15) 


= gx{1.0-{Mg)^) 


(16) 



where Xi S [0, 1]. 

— The fourth test function (F4) is the third function in P, whose Pareto front 
consists of a number of separated convex parts: 



fi = xi 



g{x2, Xn) = 1-0 -I Xi 

f2 = gx (1-0 - Vfi/g - (/i/5)sin(107r/i)) 



(17) 

(18) 

(19) 



where Xi G [0, 1]. 
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4 Case Studies 

The goal of the simulations is to verify the effectiveness of the proposed algo- 
rithms using three different evolution strategies. The first part of the simulation 
is to demonstrate that both the random distribution of the weights among the 
population (hereafter noted as Method 1) and the generation based periodical 
variation of the weights (hereafter called Method 2) proposed in Section 2 are 
able to approximate the Pareto front for different low-dimensional problems, 
even for a concave Pareto front, which is usually considered to be impossible 
by aggregation methods. Please refer to our comments in Section 1. The second 
part of the simulation is intended to compare the effectiveness of Method 1 and 
Method 2 for high-dimensional problems. It is found that Method 2 is more effec- 
tive for finding a complete Pareto front. Finally, we also show that different ES 
algorithms have different performance for multiobjective optimization with the 
same fitness assignment method. From these comparative studies, we conclude 
that the ES with Rotation Matrix Adaptation using Method 2 gives consistently 
satisfying results on different kinds of problems. 

In all the simulations, a (15, 100)-ES is used and the maximal number of 
generation is 500. 



4.1 Comparison of Both Methods for Low Dimensional Problems 

In this part, we use the standard ES for all the four test functions with dimension 
n = 2. In the figures hereafter, the stars(’*’) denote the archived solutions and 
the plus signs (’-I-’) denote the individuals in the offspring population. Figures 
Oja) and (b) show the archived solutions and the offspring in the 10th and 500th 
generation, respectively, using Method 1 for Fi. It is clearly seen that the pop- 
ulation has successfully approximated the Pareto front. Figures EJ 01 and El show 
the corresponding results for F 2 , F 3 and F 4 using Method 1. From the figures, 




Fig. 3. Method 1 for Fi (n=2) using the standard ES: (a) generation 10, (b) generation 
500. 
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(a) (b) 



Fig. 4. Method 1 for F 2 (n=2) using the standard ES: (a) generation 10, (b) generation 
500. 




(a) (b) 



Fig. 5. Method 1 for F 3 (n=2) using the standard ES: (a) generation 10, (b) generation 
500. 



we find that Method 1 is working effectively for all different test functions, which 
shows that the idea of using a uniformly random weight distribution among the 
individuals of the population is for multi-objective optimization feasible. 

Next, we run simulations on all four test functions with n = 2 using Method 
2 . The results are shown in Figures 0 0 El and EJ] 

From these figures, it is demonstrated that Method 2 has been successful in 
obtaining a very complete Pareto front for all the four test functions. The differ- 
ence between Method 1 and Method 2 is that the individuals in the population 
in Method 2 converged more completely to the Pareto front than the individuals 
in Method 1 at the end of the evolution. 

The most interesting fact is that both methods have obtained very complete 
Pareto solutions for F 3 , which has a concave Pareto front. Our empirical results 
show that in principle, concave Pareto solutions can be obtained by aggregation 
methods if the search algorithm is able to go through the concave region and if 
an archive is used to store the found Pareto solutions. 
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(a) 



(b) 



Fig. 6. Method 1 for F 4 (n=2) using the standard ES: (a) generation 10, (b) generation 
500. 
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(a) (b) 

Fig. 7. Method 2 for Fi (n=2) using the standard ES: (a) generation 10, (b) generation 
500. 

4.2 Comparison of Both Methods for High Dimensional Problems 

The performance of Method 1 and Method 2 is compared in this part of the 
simulations. The purpose of the simulations is to investigate the efficiency of 
Method 1 and Method 2 for more complex problems. In the results presented in 
the following, the standard ES is used for the four test functions with a dimension 
of 10 and 500 generations are run. Figures E] and El give the results using 
Method 1. It can be seen that the performance of Method 1 on 10-dimensional 
problems becomes worse compared to those obtained in the two dimensional 
problems. In contrast, Method 2 is still able to provide very good results on the 
same problems, as shown in Figures El and El Notice that Method 1 shows 
particularly bad performance on test functions F 3 and F 4 , which have a concave 
or discontinuous Pareto front. However, Method 2 shows quite good performance 
on all the four test functions. 

4.3 Comparison of Different Evolution Strategies for Method 2 

This part of the simulation aims at comparing the performance of different ES 
algorithms for more difficult problems (with higher dimension in this context). 
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Fig. 8. Method 2 for F 2 (n=2) using the standard ES: (a) generation 10, (b) generation 
500. 




Fig. 9. Method 2 for F 3 (n=2) using the standard ES: (a) generation 10, (b) generation 
500. 



Since Method 2 exhibits much better performance in the above part of experi- 
ments, only Method 2 will be used in the following simulations. The algorithms 
considered in this work are the standard ES, the ES with Rotation Matrix Adap- 
tation and the ES with Covariance Matrix Adaptation. The problems studied 
in this part of the simulation are the four test functions with n = 30. Results 
using the standard ES on the four functions are shown in Figures ESI and EHl 
The results using the ES with Rotation Matrix Adaptation are given in Figures 
El and El Finally, the ES with Covariance Matrix Adaptation is tested and the 
results are presented in FiguresElandEDl In all the simulations, 500 generations 
are run. 

As it is shown in the above figures, the standard ES together with Method 2 
can always provide quite a complete Pareto front, but unfortunately, the accu- 
racy of the solutions is not satisfactory. On the other hand, the ES with Rotation 
Matrix Adaptation gives consistently good results on all the four problems, which 
are comparable to or even better than those of the Pareto-based algorithms de- 
scribed in p. Interestingly, the ES with CMA produced very good results on Fi, 
but failed on F 3 and F 4 . This may be ascribed to the fact that the ES with CMA 
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Fig. 10. Method 2 for JY (n=2) using the standard ES: (a) generation 10, (b) Gener- 
ation 500. 



• 


3^ 






25' 






2 


I 








*%.,tf** 








.. 





(a) (b) 



Fig. 11. Method 1 using the standard ES for (a) F\ (n=10) and (b) F 2 (n=10). 



is a more deterministic algorithm that converges quickly before it can explore a 
wider region of the search space. 

At the same time, as for the low dimensional problems studied in Section 
4.1 of this section, the ES with RMA is able to obtain a quite complete concave 
Pareto front with the dimension of 30. This was surprising taking into account 
the fact that it is a simple, dynamic aggregation based approach. 

5 Conclusion 

Two aggregation based methods for multiobjective optimization are proposed in 
this paper. The idea is to use dynamic weights instead of fixed weights to achieve 
the Pareto solutions. We found that both methods work well on low-dimensional 
problems. However, for high-dimensional problems, the second method outper- 
forms the first one. Furthermore, the simulation results also depend on the type 
of evolution strategy that is employed. On the other hand, it also depends on the 
performance of the evolution strategy. In our experiment, the Evolution Strategy 
with Rotation Matrix Adaptation gives better performance than the standard 
Evolution Strategy. At the same time, the Evolution Strategy with Covariance 
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Fig. 12. Method 1 using the standard ES for (a) F 3 (n=10) and (b) F 4 (n=10). 




(a) (b) 

Fig. 13. Method 2 using the standard ES for (a) F\ (n=10) (b) F 2 (n=10). 



Matrix Adaptation provides very good results on smooth, high-dimensional prob- 
lems, but its performance degrades seriously on problems with a discontinuous 
and non-convex Pareto-optimal front. 

In our approach, no changes have to be made to the conventional evolutionary 
algorithm except for the dynamic weights and an archive of the found Pareto 
solutions. Therefore, the methods can straightforwardly be applied to all existing 
evolutionary algorithms with only minor modifications. 

Another interesting phenomenon is that the proposed algorithms are able to 
find the Pareto solutions with a concave Pareto front. This is a very encourag- 
ing point when we are applying aggregation-based methods to multi-objective 
optimization. Further investigation of this issue will be part of our future work. 

The problems studied in this paper are all two-objective ones. Theoretically, 
the proposed methods can be extended to problems with more than two ob- 
jectives. Expected problems are the increasing complexity and the decreasing 
efficiency, which, however, is also true for the Pareto-based approaches. 



Acknowledgments. The authors would like to thank E. Korner and W. von 
Seelen for their support and T. Arima for his insightful comments. 







108 



Y. Jin, T. Okabe, and B. Sendhoff 







i 










: 


\ , 






* 



(a) (b) 



Fig. 14. Method 2 using the standard ES for (a) F 3 (n=10) and (b) F 4 (n=10). 
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Fig. 15. Method 2 using the standard ES for (a) Fi (n=30) and (b) F 2 (n=30). 
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Fig. 16. Method 2 using the standard ES for (a) F 3 (n=30) and (b) F 4 (n=30). 
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Abstract. This paper addresses the issue by presenting a novel 
"incrementing" multi-objective evolutionary algorithm (IMOEA) with 
dynamic population size that is adaptively computed according to the 
on-line discovered trade-off surface and its desired population 
distribution density. It incorporates the method of fuzzy boundary 
local perturbation with interactive local fine-tuning for broader 
neighborhood exploration to achieve better convergence as well as 
discovering any gaps or missing trade-off regions at each generation. 
Comparative studies with other multi-objective (MO) optimization are 
performed on benchmark problem. The new suggested quantitative 
measures together with other well-known measures are employed to 
access and compare their performances statistically. 



1. Introduction 

Evolutionary techniques for MO optimization are currently gaining significant 
attention from researchers in various fields [1-7]. The methods, however, need to 
’guess’ for an optimal constant population size in order to discover and distribute the 
non-dominated individuals along the usually sophisticated trade-off surface. As 
addressed by Arabas et al. [8], evolutionary algorithm may suffer from premature 
convergence if the population size is too small. If the population is too large, 
undesired computational resources may be incurred and the waiting time for a fitness 
improvement may be too long in practice. Therefore the selection of an appropriate 
population size in evolutionary optimization is important and could greatly affect the 
effectiveness and efficiency of the optimization performance [9-13]. In the case of 
single objective (SO) optimization, various methods of determining an optimal 
population size from different perspectives have been proposed. Grefenstette [14] 
applied a Meta genetic algorithm to control the population size of another genetic 
algorithm. Smith [15] and Arabas et al. [8] proposed the approach of determining the 
population size adaptively according to the “age” of a chromosome. Zhuang et al. 
[16] proposed an adaptive population size by adapting it to the degree of improvement 
achieved at each generation. 

Unlike these approaches that are only applicable to single objective optimization 
[12,14,15,16], this paper generalizes the work in our earlier developed MOEA [5] and 
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proposes an incrementing multi-objective evolutionary algorithm (IMOEA) with 
dynamic population size for effective MO optimization. Instead of having a constant 
population to explore the solution space, IMOEA adaptively computes an appropriate 
population size according to the on-line evolved trade-offs and its desired population 
distribution density. This approach reduces the computational effort due to 
unnecessary extra individuals and avoids the pre-mature convergence or incomplete 
trade-offs resulting from insufficient number of individuals. In addition, the IMOEA 
incorporates a fuzzy boundary local perturbation technique with dynamic number of 
local perturbations per parent to encourage and reproduce the "incrementing" 
individuals for better MO optimization, whereas any ’extra’ individuals will be filtered 
through the method of switching preserve strategy proposed in the paper. While 
maintaining the global search capability, the scheme enhances the local exploration 
and fine-tuning of the evolution at each generation so as to fill-up any discovered 
gaps or discontinuities among the non-dominated individuals that are loosely located 
or far away from each other along the trade-off surface. Details of the IMOEA with 
fuzzy boundary local perturbation and other advanced features are described in 
Section 2. Section 3 presents extensive simulations and comprehensive 
quantitative/statistical comparisons of IMOEA, with other famous MO optimization 
algorithms, on benchmark problem. Conclusions are drawn in Section 4. 



2. Incrementing Multi-objective Evolutionary Algorithm 

As addressed in the Introduction, current evolutionary techniques for MO 
optimization face a common difficulty in determining an optimal population size in 
order to efficiently explore the solution space as well as to distribute along the trade- 
off surface with sufficient number of non-dominated individuals. Intuitively, it is hard 
to achieve a good evolution if the population size is too small with loosely distributed 
individuals leading to insufficient exchange of genetic information. If the population 
size is too large, however, undesirable computational effort may be incurred in 
practice. In single objective evolutionary optimization, the population size is often 
"guessed" according to the size of the search space in the parameter domain [8]. This 
is, however, not applicable to MO optimization where the global optimal is not a 
single solution but is often a set of Pareto optimal points covering the entire trade-off 
surface in the objective domain. Further, since the exact trade-offs is often unknown 
in a-priori to an optimization process, it is difficult to estimate an optimal number of 
individuals necessary for effective exploration of the solution space as well as good 
representation of the trade-off surface. 

The issue of dynamic population in MO optimization currently remains an open 
problem for researchers in the field of evolutionary computation as pointed out by [4]. 
Extending from our earlier work of MOEA [5], an incrementing multi-objective 
evolutionary algorithm is proposed to deal with this problem by adaptively computing 
an appropriate population size at each generation. The population size in IMOEA is 
thus dynamic and is increased or decreased based upon the on-line discovered Pareto- 
front and its desired population distribution density along the trade-offs. 
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2.1 Dynamic Population Size 

Instead of ’guessing’ for an optimal population size in MO optimization, an adaptation 
mechanism is employed in IMOEA such that the population size is evolved based 
upon the on-line evolved trade-offs and the required population distribution density 
defined by the user according to his/her preference on how close the non-dominated 
individuals should be distributed apart from each other on the Pareto-front. Consider 
an m-dimensional objective space, the desired population size, dpd'‘\ with the desired 
population size per unit volume, ppv, and the approximated trade-off hyper-area of 
[17] discovered by the population at generation n can be defined as. 

lowbps < = ppvX < upbps (1) 

where lowbps and upbps is the lower and upper bound for the desired population 
size dps'^"\ respectively, which can be treated as hard bounds that are optionally 
defined by the user. The trade-offs for an m-objective optimization problem is in the 
form of an (m-1) dimensional hyper-surface as pointed out by Horn and Nafpliotis 
[18] (note that 1 -dimensional surface is actually a curve while 0-dimensional surface 
is in point form), which could be approximated by the hyper-surface Aj"’ of a hyper- 
sphere as given by [17], 
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m — 1 



( 2 ) 



where is the diameter of the hyper-sphere at generation «. The above 

estimation of population size is valid for both convex and concave surfaces [17]. 

Furthermore, the same computation procedure of diameter d^’'^ can also be easily 
extended to any multi-dimensional objective space [17]. Clearly, eqns. 1 and 2 
provide a simple estimation of the desired population size at each generation 
according to the on-line discovered trade-off surface A,^ and the desired population 
density ppv, which is more efficient and appropriate than the idea of guessing for an 
optimal population size in a-priori to an optimization process as adopted in existing 
methods. 

2.2 Fuzzy Boundary Local Perturbation 

In this section, a fuzzy boundary local perturbation (FBLP) scheme that perturbs the 
set of non-dominated individuals to produce the necessary "incrementing" individuals 
for the desired population size in IMOEA as given by eqns. 1 and 2 is proposed. In 
brief, the FBLP is implemented for the following objectives: 

1. Produce additional “good” individuals in filling up the gaps or discontinuities 
among existing non-dominated individuals for better representation of the 
Pareto-front as shown in Fig. 2; 
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(n) 

Fig. 1. The diameter of a trade-off curve 



2. Perform interactive fine-learning to overcome weakness of local exploration in 
an evolutionary algorithm [19,20] and to achieve better convergence for 
evolutionary MO optimization; 

3. Provide the possibility of perturbation beyond the neighborhood to avoid pre- 
mature convergence or local saturation. 




Fig. 2. FBLP for non-dominated parents with low niched cost (apart from other parents) 

These additional Pareto points can be effectively obtained via the method of fuzzy 
boundary local perturbation at each generation. Note that only parent individuals that 
are being selected for reproduction from the tournament selection will be perturbed 
with the FBLP, and the selection criteria for the tournament is solely based upon the 
individuals’ niched cost in the objective domain [3] instead of the cost of objective 
functions in order to encourage uniform distribution. Therefore parents with low 
niched cost (located apart from other parents) will be given higher probability to be 
perturbed as compared to those with a high niched cost (located close to other 
parents). Also, the neighborhood boundary for the parents to be perturbed is fuzzy in 
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such a way that the probability of perturbation is higher within the neighborhood 
region than those outside the neighborhood [17], 

In conclusion, the FBLP differs in two aspects as compared to other local search 
methods in single-objective evolutionary optimization [19,21]. The first is that, unlike 
simple mutation, the perturbation probabilities in FBLP vary according to the 
significance of the genes in a chromosome for ease of implementation and 
computation efforts. In addition, there is no hard boundary of the neighborhood size 
in the perturbation, which gives a non-zero chance to produce offspring that are far 
away from their parents in order to keep maintaining the diversity of evolution. The 
second is that FBLP produces a number of offspring from each selected parent 
without immediate cost evaluation as to determine the acceptance or rejection for 
each of the perturbed individual, which can be regarded as an extension of mutation 
with at least one (instead of one) locally perturbed offspring per parent. 



2.3 Program Flowchart of IMOEA 

The overall program flowchart of IMOEA is shown in Fig. 3. The dynamic sharing 
method for niched cost estimation from [5] is applied here to provide a simple 
computation of at each generation, which is capable of distributing the 

population uniformly along the Pareto-front without the need of a-priori knowledge in 
setting the The detail procedures within the box of special genetic operations for 
IMOEA in the program flow are unveiled in Eig. 4. Instead of simple mutation, fuzzy 
boundary local perturbation is performed to encourage or reproduce the 
"incrementing" individuals needed for a better trade-off representation. While 
maintaining the global search capability, the scheme enhances the local exploration 
and fine-tuning of the evolution at each generation so as to fill-up any discovered 
gaps or discontinuities among the non-dominated individuals that are loosely located 
or far away from each other along the trade-off surface. 

As shown in Eig. 3, the evolved population with the desired population size of 
dps^"*'^ will be combined with the reserved non-dominated individuals at generation n 
to form a combined population that has a size larger or equal to dps"*'\ Individuals in 
the combined population are then selected for next generation such that a stable and 
well-distributed Pareto-front could be maintained at each generation. Concerning this, 
a switching preserved strategy (SPS) [17], as highlighted in the shaded region, that 
preserves the non-dominated individuals to be evolved together with the population is 
proposed is employed. 



3. Performance Comparisons on Benchmark Problems 

In this section, performance comparisons are performed among various evolutionary 
approaches for multi-objective optimization using both quantitative and statistical 
measures. Besides IMOEA, other well-known algorithms including VEGA from 
Schaffer [9], HLGA from Hajela Lin [22], NPGA from Horn and Nafpliotis [18], 
MOGA from Fonseca and Fleming [23], NSGA from Srinivas and Deb [2], SPEA 
from Zitzler and Thiele [7] and MOEA from Tan et al. [5] are included in this paper. 
Also, five different performance indexes/measures for MO optimization are employed 
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for such comparisons, apart from the usual performance measure based upon the 
results of final trade-offs graph. Some of these measures are taken from recent 
literatures while others are carefully designed and added in the paper for a more 
comprehensive comparison. 




Fig. 3. Overall program flowchart with fixed (MOEA) and dynamic (IMOEA) population size 











Incrementing Multi-objective Evolutionary Algorithms 117 



Special Genetic Operations: 

Let, dps^"^ = the size of population at current generation n 
pop^"’ = population of current generation n 

Step 1) From current pop'~"\ the desired population size in next generation 
is computed according to the adaptation mechanism in eqn. 1 : 
dps^"*'^ = ppv X AJ"’ 

Step 2) Perform tournament selection. The selected population is called 
selpop"\ 

Step 3) Perform simple crossover with crossover probability p^, and 
mating restriction (Fonseca, 1995) for selpop^"’. The resulted 
population is called crosspop^"’. 

Step 4) Perform FBLP. The resulted population is called evolpop^"’. 



Fig. 4. Detail procedures within the box of “Special Genetic Operations” in Fig. 3 



The Fonseca’s two-objective minimization problem is applied as a benchmark 
problem for the performance comparisons. This problem is chosen as it has been 
widely studied by others [5,6,23] and it has a large and non-linear trade-off curve that 
should challenge the MO evolutionary algorithm’s ability to find and maintain the 
entire Pareto-front uniformly. Besides, this problem can be easily used for 
visualization and comparison as adopted by Fonseca and Fleming [23]. The two- 
objective functions, /^and/j, to be minimized are given as 
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where, -2 < x,' < 2, Vi = 1,2,...,8 . According to eqn. 3, there are 8 parameters 
(x,,...,Xg) to be optimized so that/j and are minimal. The trade-off line is shown by 
the curve in Fig. 5, where the shaded region represents the unfeasible area in the 
objective domain. 

Five different quantitative interpretations of statistical MO optimization 
performances are applied. Some of them are taken from other literatures, while others 
are carefully designed by the authors and added in this paper for a more 
comprehensive comparison. Note that these measures were chosen such that the 
actual Pareto sets are not needed in the computation, which is often obtained through 
deterministic enumeration and is not always practically implementable [6]. 

i) Ratio of Non-dominated Individuals (RNI) 

This performance measure is denoted here as the ratio of non-dominated 
individuals (RNI) for a given population X and is mathematically formulated as: 
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(4) 

where nondom_indiv is the number of non-dominated individuals in population X 
while P is the size of population X. Therefore the value RNI = 1 means all the 
individuals in the population are non-dominated while the opposite, RNI = 0 
represents the situation where none of the individuals in the population are non- 
dominated. 




Fig. 5. Trade-off curve in the objective domain for the test problem 1 

ii) Size of Space Covered (SSC) 

This measure is originated from Zitzler and Thiele [7]. Since the original 
definition only accommodate maximization problem, it is generalized in this 
paper for both minimization and maximization process as follow. Let x e Z be 
an i-th individual member in a population X. The function SSC(X) gives the 
volume enclosed by the union of area in the objective domain where any point 
within the area is always dominated by at least one individual x. in the 
population X. The higher the value of SSC(X), the larger the dominated volume 
in the objective domain and hence the better the population is. 

Hi) Coverage of Two Set (C) 

It is a measure to compare the domination of two population in a pair-wise 
manner, i.e., how good population i dominate population j as well as how good 
population j dominate population i. This measure was proposed by Zitzler and 
Thiele [7] and is detailed as below: Let X\ Z't X be two sets of decision 
vectors. The function C transforms the ordered pair (X’, X") to the interval [0, 1] 
given as, 



{fl"G X";3a’e X'\ a' dominate or nondominate a"} 



C{X\X") = 



X" 



(5) 
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The value C(X’, X”) = 1 indicates that all solutions in X" are dominated by or 
equal to solutions in X’, while C(X’, X") = 0 means none of the points in X" are 
covered by the population Z’. 

iv) Uniform Distribution fUD) of Non-dominated Population 

Most MO optimization methods attempt to spread the non-dominated 
individuals uniformly throughout the Pareto-front using various types of fitness 
sharing and niching scheme as discussed in [24]. This paper proposes an 
improved alternate measure of uniform distribution {UD) for non-dominated 
population. Mathematically, UD(X’) for a given set of non-dominated 
individuals Z’ in a population Z, where Z’ c Z, is defined as, 

UD(X’) = —^ (6) 

1 ^nc 

where is the standard deviation of niche count of the overall set of non- 
dominated individuals Z’, and is formulated as. 



S 



nc 




{nc{x\ ) - nciXf] 
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where is the size of the set Z’; nc{x’) is the niche count of i-th individual x] 
where x] e X’; and nc(X’) is the mean value of ncix’f V i = 1, 2, ... N^., as 
shown in the following equations. 
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(9) 



where dis(i, j) is the distance between individual i and j in the objective 
domain. Note that the main advantage of this measure is that the actual Pareto- 
front is not needed in a-priori. 

v) Algorithm Effort {AE) 

The performance in MO optimization is often evaluated not only in terms of how 
good the final Pareto-front is, but also in terms of the computational time 
required in obtaining such optimal solutions, which includes the effort of 
performing genetic operations, ranking, fitness sharing and etc.. Generally, the 
algorithm effort {AE) can be defined as the total number of function evaluations 
over a fixed period of simulation time as given by, 
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AE = 



L 

N 



run 



eval 



( 10 ) 



As shown in eqn. 10, for a fixed period of more number of function 
evaluations being performed indirectly indicates that less computational effort is 
required by the optimization algorithm and hence resulting in a smaller AE. 

For uniform comparisons, the sharing distance for MOGA, NSGA, NPGA, HLGA 
as well as the performance measure of UD for all methods are set as 0.01 in the 
normalized space since the population size was set at 100. Since dynamic sharing [5] 
was used for both MOEA and IMOEA, the sharing distance is computed dynamically 
at each generation, whereas no distance parameter is needed for SPEA as proposed by 
Zitzler and Thiele [7]. Tournament selection scheme with tournament size of 2 is used 
in MOGA, SPEA, MOEA and IMOEA as suggested in their original literature. The 
Pareto tournament selection scheme with = 10% of the population size is used in 
NPGA for a tight and complete population distribution as recommended by [1]. In 
order to guarantee a fair comparison, all algorithms considered are implemented with 
the same coding scheme, crossover and mutation. Note that each parameter is 
represented by 3-digit decimal and concatenated to form the chromosomes. In all 
cases, standard mutation with a probability of 0.01 and standard crossover with two- 
point crossover and a probability of 0.7 are used. For IMOEA, the parameters of ppv 
and upbps are set as 100 in order to distribute the non-dominated individuals with the 
density of 100 individuals in unit size of normalized trade-off region. 

All methods under comparison were implemented with the same common sub- 
functions using the same programming language in Matlab [25] on an Intel Pentium II 
450 MHz computer. Each of the simulation was terminated automatically when a 
fixed pre-specified simulation period (for each test problem) is reached, in the same 
platform that is free from other computation or being interrupted by other programs. 
The period for all algorithms being compared is 180 sec. 30 independent simulation 
runs have been performed for each method in each test problem so as to study the 
statistical performance such as consistency and robustness of the methods. Here, a 
random initial population was created for each of the 30 runs, except SPEA and 
IMOEA, with an initial population size of 100 for all methods under studied. Eor 
SPEA, as according to [7], three combinations of [P, P’}, namely {95, 5}, {70, 30} 
and { 30, 70}, where P -H P’ = 100 in each case, are used. To test the ability of IMOEA 
in automatically adapting the population size to discover the Pareto-front, an initial 
population size of only 10 (instead of 100) was randomly generated for each of the 30 
runs in each test problem. The indexes of compared algorithms on each test problem 
are in the sequence: VEGA, HLGA, NPGA, MOGA, NSGA, SPEAl, SPEA2, SPEA3, MOEA and 
IMOEA. SPEA 1, 2, and 3 representing the {P, P’} combinations of {95, 5), {70, 30} 
and {30, 70}, respectively. 

Fig. 6 summarizes the simulation results for the performances of each algorithm in 
respects to each performance measure. The distribution simulation data of 30 
independent runs is represented in box plot format {26}, which has been applied by {7} 
to visualize the distribution of simulation data efficiently. In the respects of the 
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number of function evaluations and algorithm effort (AE) per run under a fixed 
CPU time, it can be seen from the first and second row of graphs in Fig. 6 that, 
VEGA, HLGA and NPGA have a relatively high and low AE. This indicates that 
these algorithms are less computational expensive as compared to others and hence 
more iterations were being performed per run within the fixed period of CPU time. 
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Fig. 6. Box plots based on the measures of AE, RNI, SSC and UD 

For the measure of ratio of non-dominated individuals (RNI), it is noticed that 
MOEA and IMOEA has relatively high mean value of RNI, which clearly indicates its 
ability to provide useful non-dominated solutions from a given size of population due 
to the incorporation of switching preserved strategy (SPS) and fuzzy boundary local 
perturbation (FBLP), where more offspring are perturbed in the neighborhood of the 
non-dominated individuals. The value of RNI in SPEA is mainly based on the setting 
of [P, P’} where it is noticeable that the higher the ratio of P’/(P + P’), the higher the 
measure of RNI is, and as a consequence, the higher the cost of computation is needed 
in the clustering algorithm of SPEA in order to filter out the extra external 
individuals. 

Concerning the measure of size of space cover (SSC), the performance of IMOEA 
is outstanding. It is also evident that MOEA is the second best in terms of measure of 
SSC for all test problems. In the context of uniform distribution (UD) of the non- 
dominated population, VEGA gives the lowest value of UD indicating that it has the 
low ability to distribute the non-dominated individuals evenly since it doesn’t has any 
operation to take care of the population distribution. Also, IMOEA has shown to be 
the best in the measure of UD, followed by MOEA with a good performance slightly 
below the IMOEA. This difference is probably due to the additional operation of 
EBLP in IMOEA that provides the local fine-tuning to discover any gaps or missing 
trade-offs at each generation. Besides IMOEA and MOEA, SPEA has also shown to 
be performing well in the measure of UD. 

The performance measures of C(X., X) for the comparison sets between algorithms 
i and j where, i,j= 1,2, ..., 10 on test problem 1 and, i,j =1,2, ... 8 on test problem 2 
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and 3, are shown in Fig. 7. Again, box plots are used to summarize the sample 
distributions of 30 independent runs per each case. The ranges of y- and x- axis of 
each graph are [0, 1] and [1, 10]. It is noticeable (concerning the rectangles C(X^g, 

II,)) that IMOEA dominate other algorithms most obviously as compared to other 
algorithms in the respective test problems. Also, IMOEA is dominated the least by 
any other algorithms in all rectangles. 




Eigs. 8 unveils the distribution of non-dominated individuals in the objective 
domain. These distributions are best selected, among the 30 independent runs, in 
respect to the measures of SSC. In general, the purpose of producing these figures are 
to visually inspect the performances of various algorithms in terms of the population 
distribution at the final generation since the quantitative performance measure alone 
may not provide enough information for full observation and understanding. 



4. Conclusions 

A novel incrementing multi-objective evolutionary algorithm that has a dynamic 
population size based upon the on-line discovered Pareto-front and its desired 
population distribution density has been proposed. The algorithm implements the 
concept of Pareto optimal domination with adaptive niche induction technique to 
evolve a set of non-dominated individuals uniformly distributing along the trade-off 
surface. In addition, the method of fuzzy boundary local perturbation with dynamic 
local fine-tuning is incorporated to achieve broader neighborhood explorations as well 
as to eliminate any gaps or discontinuities along the Pareto-front for better 
convergence and trade-offs representations. Eurthermore, extensive quantitative 
comparisons between IMOEA and other well-known MO optimization methods on 
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benchmark problems have been performed in this paper. Results obtained show that 
IMOEA has performed well in searching and maintaining the non-dominated 
solutions to be uniformly distributed along the global Pareto-front, with no significant 
computational effort needed as compared to others. 
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Abstract. In this paper, we propose a multiobjective optimization ap- 
proach based on a micro genetic algorithm (micro-GA) which is a genetic 
algorithm with a very small population (four individuals were used in our 
experiment) and a reinitialization process. We use three forms of elitism 
and a memory to generate the initial population of the micro-GA. Our 
approach is tested with several standard functions found in the special- 
ized literature. The results obtained are very encouraging, since they 
show that this simple approach can produce an important portion of the 
Pareto front at a very low computational cost. 



1 Introduction 



A considerable number of evolutionary multiobjective optimization techniques 
have been proposed in the past PED]. However, until recently, little emphasis 
had been placed on efficiency issues. It is well known that the two main processes 
that consume most of the running time of an EMOO algorithm are: the ranking 
of the population (i.e., the comparisons required to determine non-dominance) 
and the mechanism to keep diversity (evolutionary algorithms tend to converge 
to a single solution because of stochastic noise; it is therefore necessary to avoid 
this with an additional mechanism). 

Recent research has shown ways of improving the efficiency of an EMOO 
technique (see for example nnni). The main emphasis has been on using an 
external file that stores nondominated vectors found during the evolutionary 
process which are reinserted later in the population (this can be seen as a form 
of elitism in the context of multiobjective optimization [ll)H Yl2'jj ) . 

Following the same line of thought of this current research, we decided to 
develop an approach in which we would use a GA with a very small population 
size and a reinitialization process (the so-called micro-GA) combined with an 
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external file to store nondominated vectors previously found. Additionally, we 
decided to include an efficient mechanism to keep diversity (similar to the adap- 
tive grid method of Knowles & Come d)- Our motivation was to show that a 
micro-GA carefully designed is sufficient to generate the Pareto front of a multi- 
objective optimization problem. Such approach not only reduces the amount of 
comparisons required to generate the Pareto front (with respect to traditional 
EMOO approaches based on Pareto ranking), but also allows us to control the 
amount of points that we wish to obtain from the Pareto front (such amount is 
in fact a parameter of our algorithm). 

2 Related Work 

The term micro-genetic algorithm (micro-GA) refers to a small-population ge- 
netic algorithm with reinitialization. The idea was suggested by some theoretical 
results obtained by Goldberg |H], according to which a population size of 3 was 
sufficient to converge, regardless of the chromosomic length. The process sug- 
gested by Goldberg was to start with a small randomly generated population, 
then apply to it the genetic operators until reaching nominal convergence (e.g., 
when all the individuals have their genotypes either identical or very similar), 
and then to generate a new population by transferring the best individuals of 
the converged population to the new one. The remaining individuals would be 
randomly generated. 

The first to report an implementation of a micro-GA was Krishnakumar uni, 
who used a population size of 5, a crossover rate of 1 and a mutation rate of zero. 
His approach also adopted an elitist strategy that copied the best string found in 
the current population to the next generation. Selection was performed by hold- 
ing 4 competitions between strings that were adjacent in the population array, 
and declaring to the individual with the highest fitness as the winner. Krishnaku- 
mar US] compared his micro-GA against a simple GA (with a population size 
of 50, a crossover rate of 0.6 and a mutation rate of 0.001). Krishnakumar pSj 
reported faster and better results with his micro-GA on two stationary functions 
and a real-world engineering control problem (a wind-shear controller task). Af- 
ter him, several other researchers have developed applications of micro-GAs US] 
UE3H\- However, to the best of our knowledge, the current paper reports the 
first attempt to use a micro-GA for multiobjective optimization, although some 
may argue that the multi-membered versions of PAES can be seen as a form of 
micro- GA0H1]. However, Knowles & Gome m concluded that the addition of a 
population did not, in general, improve the performance of PAES, and increased 
the computational overhead in an important way. Our technique, on the other 
hand, uses a population and traditional genetic operators and, as we will show 
in a further section, it performs quite well. 

^ We recently became aware of the fact that Jaszkiewicz m proposed an approach 
in which a small population initialized from a large external memory and utilized it 
for a short period of time. However, to the best of our knowledgem this approach 
has been used only for multiobjective combinatorial optimization. 
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3 Description of the Approach 

In this paper, we propose a micro-GA with two memories: the population 
memory, which is used as the source of diversity of the approach, and the 
external memory, which is used to archive members of the Pareto optimal 
set. Population memory is respectively divided in two parts: a replaceable and 
a non-replaceable portion (the percentages of each can be regulated by the 
user). 

The way in which our technique works is illustrated in Fig.D First, an initial 
random population is generated. This population feeds the population memory, 
which is divided in two parts as indicated before. The non-replaceable portion 
of the population memory will never change during the entire run and is meant 
to provide the required diversity for the algorithm. The initial population of 
the micro-GA at the beginning of each of its cycles is taken (with a certain 
probability) from both portions of the population memory as to allow a greater 
diversity. 

During each cycle, the micro-GA undergoes conventional genetic operators: 
tournament selection, two-point crossover, uniform mutation, and elitism (re- 
gardless of the amount of nondominated vectors in the population only one is 
arbitrarily selected at each generation and copied intact to the following one) . Af- 
ter the micro-GA finishes one cycle (i.e., when nominal convergence is achieved), 
we choose two nondominated vector^ from the final population (the first and 
last) and compare them with the contents of the external memory (this memory 
is initially empty). If either of them (or both) remains as nondominated after 
comparing against the vectors in this external memory, then they are included 
there. All the dominated vectors are eliminated from the external memory. These 
two vectors are also compared against two elements from the replaceable por- 
tion of the population memory (this is done with a loop, so that each vector is 
only compared against a single position of the population memory) . If either of 
these vectors dominates to its match in the population memory, then it replaces 
it. The idea is that, over time, the replaceable part of the population memory 
will tend to have more nondominated vectors. Some of them will be used in the 
initial population of the micro-GA to start new evolutionary cycles. 

Our approach uses three types of elitism. The first is based on the notion that 
if we store the nondominated vectors produced from each cycle of the micro-GA, 
we will not lose any valuable information obtained from the evolutionary process. 
The second is based on the idea that if we replace the population memory by 
the nominal solutions (i.e., the best solutions found when nominal convergence 
is reached), we will gradually converge, since crossover and mutation will have 
a higher probability of reaching the true Pareto front of the problem over time. 
This notion was hinted by Goldberg jSj. Nominal convergence, in our case, is 
defined in terms of a certain (low) number of generations (typically, two to five 
in our case). However, similarities among the strings (either at the phenotypical 

^ This is assuming that we have two or more nondominated vectors. If there is only 
one, then this vector is the only one selected. 
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Population Memory 




Fig. 1. Diagram that illustrates the way in which our micro-GA works. 



or genotypical level) could also been used as a criterion for convergence. The 
third type of elitism is applied at certain intervals (defined by a parameter called 
“replacement cycle”). What we do is to take a certain amount of points from 
all the regions of the Pareto front generated so far and we use them to fill in 
the replaceable memory. Depending on the size of the replaceable memory, we 
choose as many points from the Pareto front as necessary to guarantee a uniform 
distribution. This process intends to use the best solutions generated so far as 
the starting point for the micro-GA, so that we can improve them (either by 
getting closer to the true Pareto front or by getting a better distribution) . This 
also avoids that the contents of the replaceable memory becomes homogeneous. 
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To keep diversity in the Pareto front, we use an approach similar to the adap- 
tive grid proposed by Knowles & Come m The idea is that once the archive 
that stores nondominated solutions has reached its limit, we divide the search 
space that this archive covers, assigning a set of coordinates to each solution. 
Then, each newly generated nondominated solution will be accepted only if the 
geographical location to where the individual belongs is less populated than the 
most crowded location. Alternatively, the new nondominated solution could also 
be accepted if the individual belongs to a location outside the previously spefi- 
cied boundaries. In other words, the less crowded regions are given preference 
so that the spread of the individuals on the Pareto front can be more uniform. 
The pseudo-code of the algorithm is the following: 

function Micro-GA 
begin 

Generate starting population P of size N 

and store its contents in the population memory M 

/* Both portions of M will be filled with random solutions */ 

i=0 

while i < Max do 
begin 

Get the initial population for the micro-GA (Pi) from M 

repeat 

begin 

Apply binary tournament selection 
based on nondominance 

Apply two-point crossover and uniform mutation 
to the selected individuals 

Apply elitism (retain only one nondominated vector) 
Produce the next generation 

end 

until nominal convergence is reached 
Gopy two nondominated vectors from Pi 
to the external memory E 
if E is full when trying to insert indb 
then adaptive_grid(indf,) 

Gopy two nondominated vectors from Pi to M 
if i mod replacement -Cycle 
then apply second form of elitism 
i=i-hl 
end while 
end function 



The adaptive grid requires two parameters: the expected size of the Pareto 
front and the amount of positions in which we will divide the solution space for 
each objective. The first parameter is defined by the size of the external memory 
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and it is provided by the user. The second parameter (the amount of positions in 
which we will divide the solution space) has to be provided by the user as well, 
although we have found that our approach is not very sensitive to it (e.g., in most 
of our experiments a value of 15 or 25 provided very similar results). The process 
of determining the location of a certain individual has a low computational cost 
(it is based on the values of its objectives as indicated before). However, when 
the individual is out of range, we have to relocate all the positions. Nevertheless, 
this last situation does not occur too often, and we allocate a certain amount of 
extra room in the first and last locations of the grid to minimize the occurrence 
of this situation. 

When the external memory is full, then we use the adaptive grid to decide 
what nondominated vectors will be eliminated. The adaptive grid will try to 
balance the amount of individuals representing each of the elements of the Pareto 
set, so that we can get a uniform spread of points along the Pareto front. 

4 Comparison of Results 

Several test functions were taken from the specialized literature to compare our 
approach. In all cases, we generated the true Pareto fronts of the problems using 
exhaustive enumeration (with a certain granularity) so that we could make a 
graphical comparison of the quality of the solutions produced by our micro-GA. 

Since the main aim of this approach has been to increase efficiency, we ad- 
ditionally decided to compare running times of our micro-GA against those of 
the NSGA II 0 and PAES the following examples, the NSGA was run 

using a population size of 100, a crossover rate of 0.8 (using SBX), tournament 
selection, and a mutation rate of 1/vars, where vars = number of decision vari- 
ables of the problem. In the following examples, PAES was run using a depth of 
5, a size of the archive of 100, and a mutation rate of 1/bits, where bits refers 
to the length of the chromosomic string that encodes the decision variables. The 
amount of fitness function evaluations was set such that the NSGA II, PAES 
and the micro-GA could reasonably cover the true Pareto front of each problem. 

4.1 Test Function 1 

Our first example is a two-objective optimization problem defined by Deb |S|: 



Minimize /i(xi,X 2 ) = xi 



( 1 ) 



ff(X2) 



Minimize f 2 {xi,X 2 ) 



Xl 



( 2 ) 
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where: 



9fe) = 2.0-exp|-(?|^) }-“8»p|-(^^) I (3) 

and 0.1 < a;i < 1.0, 0.1 <X 2 < 1.0. 




Fig. 2. Comparison of results for the first test function. 



The parameters used by the micro-GA for this example were: size of the ex- 
ternal memory =100, size of the population memory = 50, number of iterations 
= 1500, number of iterations of the micro-GA (to achieve nominal convergence) 
= 2, number of subdivisions of the adaptive grid = 25, crossover rate = 0.7, 
mutation rate = 0.029, percentage of non-replaceable memory = 0.3, population 
size (of the micro-GA) = 4, replacement cycle at every 50 iterations. 

Our first test function has a local Pareto front to which a GA can be easily 
attracted. Fig. Qshows the true Pareto front for this problem with a continuous 
line, and the results found by the NSGA II, PAES and our micro-GA are shown 
with points. Similar fronts were found by the three approaches. For this exam- 
ple, both the NSGA II and PAES performed 12,000 evaluations of the fitness 
function. The average running time of each algorithm (over 20 runs) were the fol- 
lowing: 2.601 seconds for the NSGA II (with a standard deviation of 0.33555913), 
1.106 seconds for PAES (with a standard deviation of 0.25193672) and only 0.204 
seconds for the micro-GA (with a standard deviation of 0.07764461). 



A Micro-Genetic Algorithm for Multiobjective Optimization 133 



4.2 Test Function 2 

Our second example is a two-objective optimization problem proposed by 
Schaffer ^ that has been used by several researchers jlHll) : 



Minimize fi{x) 



—X if a; < 1 
—2 + xiil<x<‘i 
4 — cc if3<a;<4 
—4 -I- a; if a; > 4 



( 4 ) 



Minimize f 2 {x) = {x — 5)^ 



( 5 ) 



and — 5 < a; < 10. 
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Fig. 3. Comparison of results for the second test function. 



The parameters used for this example were: size of the external memory = 
100, size of the population memory = 50, number of iterations = 150, number 
of iterations of the micro-GA (to achieve nominal convergence) = 2, number of 
subdivisions of the adaptive grid = 25, crossover rate = 0.7, mutation rate = 
0.056 (1/L, where L=18 bits in this case), percentage of non-replaceable memory 
= 0.3, population size (of the micro-GA) = 4, replacement cycle at every 25 
iterations. 
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This problem has a Pareto front that is disconnected. Fig. 0 shows the true 
Pareto front for this problem with a continuous line (the vertical line is obviously 
not part of the true Pareto front, but it appears because we used linear segments 
to connect every pair of nondominated points). We used points to represent the 
solutions found by the NSGA II, PAES and our micro-GA. 

Again, similar Pareto fronts were found by the three approaches. For this ex- 
ample, both the NSGA II and PAES performed 1,200 evaluations of the fitness 
function. The average running time of each algorithm (over 20 runs) were the fol- 
lowing: 0.282 seconds for the NSGA II (with a standard deviation of 0.00014151), 
0.107 seconds for PAES (with a standard deviation of 0.13031718) and only 0.017 
seconds for the micro-GA (with a standard deviation of 0.0007672). 

4.3 Test Ftinction 3 

Our second example is a two-objective optimization problem defined by Deb 0: 



Minimize /i(xi,X 2 ) = xi 



( 6 ) 



Minimize f 2 (xi,X 2 ) = g{xi,X 2 ) ■ h(xi,X 2 ) 



( 7 ) 



where: 



g(xi, 0 : 2 ) = 11 -I- a;2 — 10 • cos(27ra:2) (8) 



h{x,,X2) = I ^ /l(^l,^2) < 9{X„X,) 

[ 0 otherwise 

and 0 < a:i < 1, —30 < X 2 < 30. 

This problem has 60 local Pareto fronts. Fig. 0 shows the true Pareto front 
for this problem with a continuous line. The results obtained by the NSGA II, 
PAES and our micro-GA are displayed as points. 

The parameters used by the micro-GA for this example were: size of the ex- 
ternal memory = 100, size of the population memory = 50, number of iterations 
= 700, number of iterations of the micro-GA (to achieve nominal convergence) 
= 4, number of subdivisions of the adaptive grid = 25, crossover rate = 0.7, 
mutation rate = 0.029, percentage of non-replaceable memory = 0.3, population 
size (of the micro-GA) = 4, replacement cycle at every 50 iterations. 

Once again, the fronts produced by the three approaches are very similar. 
For this example, both the NSGA II and PAES performed 11,200 evaluations 
of the fitness function. The average running time of each algorithm (over 20 
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Fig. 4. Comparison of results for the third test function. 



runs) were the following: 2.519 seconds for the NSGA II (with a standard de- 
viation of 0.03648403), 2.497 seconds for PAES (with a standard deviation of 
1. 03348519) and only 0.107 seconds for the micro-GA (with a standard deviation 
of 0.00133949). 

4.4 Test Function 4 

Our fourth example is the so-called “unitation versus pairs” problem which 
involves the maximization of two functions over bit strings. The first function, /i 
is the number of pairs of adjacent complementary bits found in the string, and 
the second function, /2 is the numbers of ones found in the string. The Pareto 
front in this case is discrete. We used a string length of 28, and therefore, the 
true Pareto front is composed of 15 points. 

The parameters used for this example were: size of the external memory = 
100, size of the population memory = 15, number of iterations = 1250, number 
of iterations of the micro-GA (to achieve nominal convergence) = 1, number 
of subdivisions of the adaptive grid = 3, crossover rate = 0.5, mutation rate 
= 0.035, percentage of non-replaceable memory = 0.2, population size (of the 
micro-GA) = 4, replacement cycle at every 25 iterations. 

Fig. 0 shows the results obtained by our micro-GA for the fourth test func- 
tion. A total of 13 (out of 15) elements of the Pareto optimal set were found on 
average (only occasionally was our approach able to find the 15 target elements). 
PAES was also able to generate 13 elements of the Pareto optimal set on average, 
and the NSGA II was only able to generate 8 elements on average. 
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Unitation versus pairs 




01 Z 3 -4 5 G 7 & 



Fig. 5. Results of the micro-GA for the fourth test function. 



For this example, both the NSGA II and PAES performed 5,000 evaluations 
of the fitness function. The average running time of each algorithm (over 20 runs) 
were the following: 2.207 seconds for the NSGA II, 0.134 seconds for PAES and 
only 0.042 seconds for the micro-GA. 

Borges & Barbosa | 2 | reported that were able to find the 15 elements of the 
Pareto optimal set for this problem, using a population size of 100 and 5,000 
evaluations of the fitness function, although no actual running times of their 
approach were reported. 

4.5 Test Ftinction 5 

Our fifth example is a two-objective optimization problem defined by Kursawe 

m-- 



Minimize fi{x) — 






( 10 ) 



Minimize f 2 {x) — ^ (|a:i|° ® -I- 5 sin(a;i)^) (11) 

i=l 



where: 



5 < X\,X2, X3 < 5 



( 12 ) 
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Fig. 6. Comparison of results for the fifth test function. 



Fig. El shows the true Pareto front for this problem as points. The results 
obtained by the NSGA II, PAES and our micro-GA are also shown as points. 
It is worth mentioning that PAES could not eliminate some of the dominated 
points in the runs performed. 

The parameters used for this example were: size of the external memory = 
100, size of the population memory = 50, number of iterations = 3000, number 
of iterations of the micro-GA (to achieve nominal convergence) = 2, number 
of subdivisions of the adaptive grid = 25, crossover rate = 0.7, mutation rate 
= 0.019, percentage of non-replaceable memory = 0.3, population size (of the 
micro-GA) = 4, replacement cycle at every 50 iterations. 

For this example, both the NSGA II and PAES performed 2,400 evaluations 
of the fitness function. The average running time of each algorithm (over 20 runs) 
were the following: 6.481 seconds for the NSGA II (with a standard deviation of 
0.053712), 2.195 seconds for PAES (with a standard deviation of 0.25408319) and 
only 0.704 seconds for the micro-GA (with a standard deviation of 0.00692099). 



5 Conclusions and Future Work 

We have introduced an approach that uses a GA with a very small population 
and a reinitialization process to generate the Pareto front of a multiobjective 
optimization problem. The technique has a exhibited a low computational cost 
when compared to the NSGA II and PAES in a few test functions. The approach 
uses three forms of elitism, including an external file of nondominated vectors 
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and a refilling process that allows us to approach the true Pareto front in a 
successive manner. Also, we use an adaptive grid (similar to the one used by 
PAES fl4[l that maintains diversity in an efficient way. 

The approach still needs more validation (particularly, with MOPs that have 
more decision variables and constraints), and needs to be compared with other 
EMOO approaches (under similar conditions) using some of the metrics that 
have been proposed in the literature (see for example m)- We only provided 
running times produced on a PC, but a more exhaustive comparison is obviously 
lacking. However, the preliminary results presented in this paper, indicate the 
potential of the approach. 

Some other future work will be to refine part of the process, so that we can 
eliminate some of the additional parameters that the approach needs. Since some 
of them are not very critical (e.g., the number of grid subdivisions, or the amount 
of iterations to reach critical convergence), we could probably automatically 
preset them to a reasonable value so that the user does not need to provide 
them. 

Finally, we are also interested in using this approach as a basis to develop a 
model of incorporation of preferences from the decision maker ^ . 
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Abstract. In this paper the method of selecting a representative subset of Pareto 
optimal solutions is used to make the search of Pareto frontier more effective. 
Firstly, the evolutionary algorithm method for generating a set of Pareto optimal 
solutions is described. Then, indiscemibility interval method is applied to select 
representative subset of Pareto optimal solutions. The main idea of this method 
consists in removing from the set of Pareto optimal solutions these solutions, 
which are close to each other in the space of objectives, i.e., those solutions for 
which the values of the objective functions differ less than an indiscemibility 
interval. The set of Pareto optimal solutions is reduced using indiscemibility 
interval method after mnning a certain number of generations. This process can 
be called the filtration process in which less important Pareto optimal solutions 
are removed from the existing set. Finally, two design optimization problems 
are solved using the proposed method. From these examples it is clear that the 
computation time can be reduced significantly and still the real Pareto frontier 
obtained. 



1 Introduction 

While running evolutionary algorithms for multicriteria optimization a set of Pareto 
solutions increases with the number of generations. For some problems the final result 
of running the program is the Pareto set which contains hundreds or even thousands of 
solutions and some of them might be very close to each other considering the values of 
the objective functions. For the decision maker it is difficult and tiresome to analyze 
all these solutions. Moreover, in each generation a new generated solution is compared 
with all solutions from the Pareto set. If the set of Pareto solution becomes larger the 
calculation time increases proportionally to the size of this set. Osyczka & Montu- 
siewicz, 1994 while using a simple random search method, noticed this problem. They 
proposed some methods of selecting a representative subset of Pareto optimal solu- 
tions. Montusiewicz, 1999 has developed further one of these methods. While using 
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evolutionary algorithms these methods can be very useful considering both computa- 
tion time and the decision-making problem. In this paper the method for selecting the 
representative set of Pareto optimal solutions is described and applied to the evolu- 
tionary algorithm to make the search more effective. 

2 Method for Generating Pareto Optimal Solutions 

Pareto set distribution method is used to generate the set of Pareto optimal solutions. 
The general idea of this is as follows: Within each new generation a set of Pareto 
solutions is found on the basis of two sets: the set of Pareto solutions from a previous 
generation and the set of solutions created by genetic algorithm operations within the 
considered generation. The new set of Pareto solutions, thus created is distributed 
randomly to the next generation for a half of the population. The remaining half of the 
population is bred by randomly chosen strings from the previous generation. The 
steps of the method are as follows: 

Step 1. Let f = 1, where t is the index of a generation. 

Step 2. Generate an initial population. 

Step 3. If t = \ find the set of Pareto solutions. If t > 1 find and create the set of 
Pareto solutions from the following sets: Pareto set from the t - \ genera- 
tion and all solutions from the t generation. 

Step 4. Check if there is any improvement in the Pareto set through the last t* 
generations, i.e., if any new Pareto solution was found through the last t* 
generations, where t* is the user’s preassigned number. If there is an im- 
provement go to step 5, otherwise, terminate the calculations. Such a 
situation means that the genetic algorithm has not been able to find a new 
Pareto optimal solution through the last t* generations. 

Step 5. Substitute f = t-tl. If t< T go to step 6, where T is the preassigned number 
of generations to be considered. Otherwise, terminate the calculations. 

Step 6. Create a new t generation in the following way: for odd strings substitute 
randomly chosen strings from the Pareto set obtained in the t- 1 generation 
whereas for even strings substitute randomly chosen strings from the t - I 
generation. Note that it may happen that some strings may be copied 
twice but in different places. This refers only to Pareto optimal strings. 

Step 7. Perform the genetic algorithm operations (crossover and mutation) on the 
strings of the whole population, and go to step 3. 

Here two types of termination criteria are used. The first, which indicates that there is 
absolutely no improvement in the Pareto set after running t* number of generations, is 
the more important criterion. The other criterion is related to the computational time 
needed for generating the Pareto solutions, i.e., the method is stopped after the as- 
sumed number of generations is executed. The graphical illustration of this method is 
shown in Fig.l. For the continuous decision variables the method generates a large 
number of Pareto solutions, which are very close to each others in the space of objec- 
tive functions. Thus indiscernibility interval method is used to select a representative 
set of Pareto optimal solutions. 
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Fig. 1. Graphical illustration of the Pareto Set Distribution Method. 



3 Indiscernibility Interval Method 



The main idea of this method consists in removing from the set of Pareto optimal 
solutions these solutions, which are close to each other, i.e., those solutions for which 
the values of the objective functions differ less than an indiscernibility interval. The 
method introduces an indiscernibility interval m, for the i-th objective function. This 
interval given in percentages defines differences between the values of the objective 
functions which qualify the solution to be removed from the Pareto set. In other 
words, if there is more than one solution in the interval m, one solution remains in the 
Pareto set and others are discarded. The method is based on the following definition: 
A solution X e A is a non-dominated solution in the indiscernibility interval sense if 
and only if there is no solution x^ e A such that: 



for /,.(x")>0.' 



for /,.(x")<0.- 




This definition is illustrated graphically in Fig.2 in which the indiscernibility intervals 
and «2 are denoted by shaded areas. In this case for the first objective function 
solution 16 eliminates solutions 15 and 17 and solution 20 eliminates solution 21. For 
the second objective function solution 6 eliminates solutions 5 and 6. Montusiewicz, 
1999, gives a more detailed description of this method. 
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Fig. 2. Graphical illustration of the indiscemihility interval method. 



Note that for selecting the representative subset of Pareto optimal solutions the min- 
max method as presented by Osyczka & Montusiewicz, 1994 can also be applied here. 
Note also that the method of selecting the representative subset of Pareto optimal 
solutions may be also used at the end of running any evolutionary algorithm multicri- 
teria optimization method. In this case instead of making the decision on the basis of 
hundreds of Pareto optimal solutions generated by the evolutionary algorithm the 
decision maker chooses the most preferable solution from only several solutions, 
which are evenly distributed over the Pareto frontier. 



4 Evolutionary Algorithm Method with Selecting the 
Representative Subset of Pareto Solutions 

The idea of the method consists in reducing the set of Pareto optimal solutions using 
indiscernibility interval method after running a certain number of generations. This 
process can be called the filtration process in which less important Pareto optimal 
solutions are removed from the existing set. 

The steps of the method are as follows: 

Step 1. Set t=\, where t is the number of the currently run generation. 

Step 2. Generate the set of Pareto optimal solutions using any evolutionary algo- 
rithm method. 

Step 3. Is the criterion for filtration the set of Pareto solutions satisfied? If yes, 
select the representative subset of Pareto solutions using the indiscernibil- 
ity interval method and go to step 4. Otherwise, go straight to step 4. 
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Step 4. Set t = t + \ and \f t <T, where T is the assumed number of generations, 
go to step 2. Otherwise, terminate the calculations. 

Note that if in the step 3 the answer is yes we start the process, which can be called the 
filtrating process since we filtrate and retain in the Pareto set only these solutions 
which are not close to each other in the space of objectives. Note also that in Step 3 
the term the criterion for filtration is introduced. Three types of criteria can be used 
here: 

Type 1. The number of solutions in the Pareto set exceeds the assumed number P, 
for example 100. 

Type 2. The number of solutions in the Pareto set is assumed as P. The first filtra- 
tion is made if the number of solutions in the Pareto set exceeds this num- 
ber. The following filtration is made when P new Pareto optimal solutions 
is added to the set. 

Type 3. The filtration is made after running the assumed number of generations P 
and in this case the number of the solutions in the Pareto set is not con- 
trolled. 

These three types of criteria may produce slightly different results but generally all of 
them reduce the computation time significantly. The choice of the criterion depends on 
the problem to be solved. Using these three criteria the choice of P should be made 
with great care. If P is too small the number of Pareto solutions might not be repre- 
sentative for the problem and the evolutionary algorithm may not reach the real Pareto 
frontier. If P is too large we lose the effect of reducing the calculation time. Also the 
choice of the indiscernibility interval m, is very important. If m, is too small the num- 
ber of rejected solutions is also too small and there is no effect in reducing the set of 
Pareto solutions, whereas too big a value of m, may make the subset of the obtained 
solutions to small too be representative. 



5 Numerical Examples 

5.1 Beam Design Problem 

Let us consider the problem of optimum design of the beam the scheme of which is 
presented in Fig. 3. The multicriteria optimization problem is formulated as follows: 
The vector of decision variables is: 

X=[xi,X2,-,Xf^Y , 

where is the thickness of the n-th part of the beam. 

The objective functions are: 

- the volume of the beam 

/i(x)=h/Ix„ 

n=l 
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- displacement under the force F: 



/2W 



Fl^ 

~2E 



1 ^ 
-+ s 




where 

bx^ 

L=— for n=l,2,...,N 

12 




The constraints are: 



6Fxnl 

bxl 



Fig. 3. Scheme of the beam, 
for n=l, 2,..., N 



0 < <d for n = 1, 2,..., N-1 



The problem was run for the following data: 

At =6, Z = 50[mm], Z7 = 50[mm], F= 10000[N], 

£ = 2.06X 10'[N/mm"], (T=360[N/mm"], r/ = 32[mm] 

The problem was considered as a continuous programming problem with the follow- 
ing ranges of decision variables: 12 < < 32 [mm] for i=l,2...6 

The evolutionary algorithm parameters were as follows: 

- Number of generation T=1000 

- Population size / = 100 

- Crossover rate = 0,6 

- Mutation rate = 0,08 

- Penalty rate r = 10^ 

Experiments were carried out using indiscernibility intervalM, =5% for i = 1,2,.. .,6 and 
for P = 100. The results of experiments are shown in Fig. 4 in which Pareto frontiers 
for solutions without filtration and with filtration are compared for three types of the 
filtration criteria. The solutions depicted by black points are obtained while running 
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Fig. 4. Sets of Pareto optimal solutions for the beam design problem. 



148 A. Osyczka and S. Krenich 



the evolutionary algorithm without filtration process whereas, depicted by almost 
white points are obtained while running the evolutionary algorithm with the assumed 
type of filtration. In Table 1 the number of generated Pareto optimal solutions ob- 
tained during each experiment and the computing time for each experiment are pre- 
sented. From these experiments it is clear that using the indiscernibility interval 
method with the evolutionary algorithm the computation time may be reduced several 
times and still almost the same Pareto frontier is obtained, which in some parts is even 
better than the one obtained by the ordinary evolutionary algorithm. 



Table 1. Comparison of the results for different filtration types for the beam design problem. 



Method 


Number of Pareto solution 


Time [s] 


Without filtration 


2007 


608 


Filtration Type 1 


152 


29 


Filtration Type 2 


1102 


184 


Filtration Type 3 


207 


34 



5.2 Robot Gripper Problem Design 



Let us consider an example of the optimum design of a robot gripper as formulated by 
Krenich and Osyczka, 2000. The scheme of the gripper is presented in Fig. 5. For this 
gripper the vector of decision variables is x = [ a,b,c,e,fl, where a,b,c,e,f,l, are 
dimensions of the gripper and <Jis the angle between b and c elements of the gripper. 
The geometrical dependencies of the gripper mechanism are (see Fig.6): 



8 



= a/0“^) 






(f) = atam 






= b^ + -2-b- g ■cos{fi + ^) , b^ =a^ + g^ -2- a- g ■ cos{a — (j)). 



a = arccosl 



^ g2 + g2_b 2 \ 

2-a-g 



-f , /? = arc cos 



^ b^+g2_g 2\ 

2-b-g 



-<!> 




Fig. 5. Scheme of the robot gripper mechanism. 
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Fig. 6. Geometrical dependencies of the gripper mechanism. 
The distribution of the forces is presented in Fig.7 and so we have: 




Fig. 7. Distribution of the forces in the mechanism of the griper. 

Using the above formulas the objective functions can be evaluated as follows: 

- The first objective function is the difference between the maximum and minimum 

griping forces for the assumed range of gripper ends displacement: 

/i (x) = max F^. (x, z) - min (x, z) 

z z 

- The second objective function is the force transmission ratio: 



/2(x) 



P 

min F), (x, z) 

z 



Note that both objective functions depend on the vector of decision variables and on 
the displacement z. Thus for the given vector x the values of f^(x) and /^(x) can be 
evaluated by using a special procedure which finds the maximum of (x, z) and the 
minimum of Fj^{x,z) for different values of z. This procedure makes the objective 
functions computationally expensive and the problem becomes more complicated than 
a general nonlinear programming problem. 
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From the geometry of the gripper the following constraints can be derived: 

gi(x) = ^ 0, 

g2(x)=y(x,Z„^J>0, 
g3(x)=y(x,0)-F„^^>0, 
g 4 (x) = Fg - y(x,0) > 0, 
g 5 (x) = {a + b)^ -e^ >0, 

+ {a-ef -b^ >Q, 

.?7 (x) = 1 “ ^max — 0 

where: 

y(x,z)=2-[e + / + c • sin{P + <J)] displacement of the gripper ends, 

F^,„ - minimal dimension of the griping object, 

^max ~ maximal dimension of the griping object, 

Yq - maximal range of the gripper ends displacement, 

Zmax ~ maximal displacement of the gripper actuator, 

a) b) 

a 



Fig. 8. Constraints illustration a) constraint V, b) constraint VI. 

The method was run using the following data: 

1. Side constraints: 

10<a<250, 10 <fc^50, 100 <c <300, 0 <e<50, 

10</<250, 100 </ <300, 1.0 <<J< 3.14. 

2. Geometric parameters: 

Z™„=50, F_=100, Fc=50, Z_=50, P = 100. F_=100, 

3. Parameters for the evolutionary algorithm: 

- Length of the string for every decision variable 18 bits. 

- Crossover rate = 0.6, 

- Mutation rate = 0.01, 

- Population size J = 400. 

- Number of generations T = 1000. 
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The problem was considered as continuous nonlinear programming problem. Experi- 
ments were carried out using indiscernibility interval m, = 5% for i = 1,2,..., 6 and for 
P=100. The results of experiments are shown in Fig. 8 in which Pareto frontiers for 
solutions without filtration and with filtration are compared for three types of the fil- 
tration criteria. The solutions depicted by black points are obtained while running the 
evolutionary algorithm without filtration process whereas depicted by almost white 
points are obtained while running the evolutionary algorithm with the assumed type of 
filtration. In Table 2 the number of generated Pareto optimal solutions obtained during 
each experiment and the computing time for each experiment are presented. From 
these experiments it is clear that using the indiscernibility interval method with the 
evolutionary algorithm the computation time may be reduced several times and still 
almost the same Pareto frontier obtained, which in some parts is even better than the 
one obtained by the ordinary evolutionary algorithm. 

Table 2. Comparison of the results for different filtration types for the robot gripper design. 



Method 


Number of Pa- 
reto solution 


Time [s] 


Without filtration 


1882 


682 


Filtration Type 1 


308 


83 


Filtration Type 2 


247 


67 


Filtration Type 3 


270 


69 



6 Conclusions 

The paper deals with the problem of reducing the calculation time while generating 
the Pareto optimal solutions for multicriteria continuous nonlinear programming 
problems. The main idea of the presented approach is to maintain only a representative 
subset of Pareto optimal solutions while running the evolutionary algorithm based 
method. The indiscernibility interval method is applied to select a representative sub- 
set of Pareto optimal solutions. The results obtained using this method with the genetic 
algorithm indicate that the computation time can be reduced significantly and yet the 
real Pareto frontier obtained. The process of selecting the representative subset of 
Pareto optimal solutions can be called the filtration process in which less important 
Pareto optimal solutions are removed from the existing set. Three different types of 
the filtration processes can be performed and they are presented in the paper. All these 
types give a significant reduction in the computation time but from the experiments 
carried out so far it is difficult to recommended the best one. More experiments are 
needed and this will be the subject of future investigation. 
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Fig. 9. Sets of Pareto optimal solutions for the gripper design problem. 
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Abstract. Many optimisation problems in circuit design, in the following also 
refereed to as VLSI CAD, consist of mutually dependent sub-problems, where 
the resulting solutions must satisfy several requirements. Recently, a new 
model for Multi-Ohjective Optimisation (MOO) for applications in 
Evolutionary Algorithms (EAs) has been proposed. The search space is 
partitioned into so-called Satisfiability Classes (SCs), where each region 
represents the quality of the optimisation criteria. Applying the SCs to 
individuals in a population a fitness can be assigned during the EA run. The 
model also allows the handling of infeasible regions and restrictions in the 
search space. Additionally, different priorities for optimisation objectives can 
be modelled. In this paper, the model is studied in further detail. Various 
properties are shown and advantages and disadvantages are discussed. The 
relations to other techniques are presented and experimental results are given 
to demonstrate the efficiency of the model. 



1 Introduction 

Evolutionary Algorithms (EAs) become more and more important as a tool for search 
and optimisation. Especially for hard combinatorial problems they often have been 
applied successfully (see e.g. [Mic94,Dre98]). This type of problem is often 
encountered in Very Large Scale Integration CAD (VLSI CAD), since there often 
problem instances of several million components have to be considered. Multiple, 
competing criteria have to be optimised subject to a large number of non-trivial 
constraints. One strategy is to artificially divide a problem into a number of sub- 
problems, which are then solved in sequence. Obviously, this is not a promising 
strategy if the objectives are conflicting. EAs are well suited for solving this kind of 
problems, when mutually dependent sub-problems are considered in parallel. One 
problem that arises when using EAs is to evaluate the solutions of a population. Thus 
it is necessary to determine a ranking of the elements to see which solutions are better 
than others. 

Traditionally, the evaluation is done by an objective function which maps a solution of 
multiple objectives to a single value. A classical method is the linear combination by 
weighted sum, where the value of each objective is weighted by a constant coefficient. 
The values of the weights determine how strong the specific objective influences the 
value of a single fitness value. Disadvantages are that e.g. the weights have to be 
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known in advance to find good solutions or have to be determined by experiments. 
Obviously, this is time consuming and not desirable, since the parameters resulting 
from different runs may vary, ending in “in-stable” algorithms [Esb96]. 



1.1 Previous Work 

Advanced methods for ranking solutions with multiple objectives have been 
developed over the years. If priorities exist between the objectives, a simple 
lexicographic order can be used. (Information on lexicographic sorting of vectors can 
be found in standard mathematical literature.) 

In [Gol89] a method is described where solutions with multiple objectives without 
preferences can be compared. This is realised by a relation, called dominate. A 
solution X dominates y, if x is equal or better for each objective than y, and x is for at 
least one component strongly better than y. Thus, the solutions in the search space can 
be ranked by the relation dominate. This approach of ranking solutions is the core of 
many EA-tools for Multi-Objective Optimisation (MOO) [SD95,EE95,ZT99]. 

In [EK96] another approach is proposed, where the search space is divided in a 
satisfiable, acceptable, and invalid range. This model has successfully been applied to 
one specific problem in the area of VLSI CAD, but it requires user interaction. The 
designer has to specify the limits between satisfiable, acceptable, and invalid 
solutions. The limits have to be adapted during the program run to obtain high quality 
results. 

Recently, in [DDE 99] a new model for ranking solutions of MOO problems has been 
proposed. A relation favour is defined analogously to dominate in [Gol89]. The 
search space is divided into several Satisfiability Classes (SCs). Thus the approach 
can be seen as a generalisation of the approach in [EK96] using a finer granularity. 



1.2 Results 

In this paper the model from [DDB99] is studied in further detail. The relation favour 
in comparison to other models, like weighted sum and dominate [Gol89], is shown, 
e.g. favour is able to compare solutions which are not comparable using relation 
dominate. 

By this technique no user interaction is required any more. This is a very important 
aspect in VLSI CAD tools, since they are often so complex that it is very hard for a 
user to keep control. Eurthermore, handling of priorities is also supported. Infeasible 
solutions are assigned to their own SCs depending on the objectives which are in an 
infeasible range. The SCs can be efficiently manipulated using operations on graphs. 
Finally, also experimental evaluations are given. 
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2 Multi-objective Optimisation Problems 

In general, many optimisation problems consist of several mutually dependent sub- 
problems. MOO problems can be defined as follows: Let H be the feasible range of 
solutions in a given search space. Thg objective function f : IT — > R”^. assigns a cost 
to each objective of a solution s, where f/s) = (f j(s),f j(s),...,f ,,(s)), with s e IT, n gN 
and the positive real valued numbers*. 

To compare several multi-objective solutions superior points in the search space can 
be determined. In the following we restrict to minimisation problems. The relation 
dominate is defined as proposed in [Gol89]. Let x, y g R”^ be the costs of two 
different solutions, x dominates y (x y), if x y and y is as large as x in each 
component. More formally: 

Definition 2.1 

X <, y : <^ ( 3 i: L(x) < f .(y)) a (Vj ^ i: L(x) < L(y)) 
x< y:^(Vi:L(x)<L(y)) 

A non-dominated solution is called a Pareto-optimal solution. defines the set of all 
Pareto-optimal solutions, called the Pareto-set, and additionally is a partial order. 
All elements x g H in the Pareto-set are equal or not comparable. Usually, all points 
in this set are of interest for the decision maker or designer. 



3 The Model 

We briefly review the model from [DDB99]. The main idea of the proposed model is 
to extend the approach from [EK96], such that the search space is divided into more 
than three categories, like e.g. superior, very good, good, satisfiable, and invalid. 

In the following the solutions are divided into so-called Satisfiability Classes (SCs) 
depending on their quality. The SCs are computed by a relation denoted /a vowr, i.e. no 
limits have to be specified by user interaction. Solutions of “similar” quality belong 
to the same SC and thus each SC corresponds to a class of solutions of the same 
quality. After sorting the SCs with respect to their quality a ranking of the solutions is 
obtained. 

Let n be a finite set of solutions and n G N. The objective function f^ : II — > R”^ 
assigns a cost to each x g IT as defined above. To classify solutions in SCs we define 
the relation /a vowr (<,): 

Definition 3.1 

X <,y « |{i : 4(x) < fjy), 1 < i < n}| > |{ j : f,/x) < f,/y), 1 < j < n }| 



’ In general MOO problems the objective functions are not necessarily restricted to R”^ but it is 
convenient for our purposes. 
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4 Properties of Relation Favour 

We start with some general properties (that have partially already been observed in 
[DDB99]). Then we study the advantages and disadvantages in more detail and also 
focus on several properties of the model. 



4.1 Basics 

Using Definition 3.1 we are able to compare elements x, y e IT in pairs more 
precisely, x is favoured to y (x <, y) iff i (i < n) components of x are smaller than the 
corresponding components of y and only j (j < i) components of y are smaller than the 
corresponding components of x. 

We use a graph representation to describe the relation, where each element is a node 
and “preferences” are given by edges. Relation <, is not a partial order, because it is 
not transitive, as can be seen as follows: 





a) b) c) 



Fig. 4.1. Relation graph 

Example 4.1: Consider some solution vectors from 

(8,7,1) (1,9,6) (7,0,9) (1,1,2) (2,1,1) 

The relation graph of is given in Figure 4.1 a). Vectors (1,1,2) and (2,1,1) are 
preferred to all other vectors, but they are not comparable. The remaining three 
vectors (8,7,1), (1,9,6), and (7,0,9) are comparable in pairs. But as can be seen in the 
relation graph they describe a ’’cycle”. Thus relation <, is not transitive. 
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To get some more insight in the structure of the model we briefly focus on the 
meaning of the cycles in the relation graph: Elements are ranked equally when they 
are included in a cycle, because no element is superior to all the others. Elements that 
describe a cycle are denoted as not comparable. The determination of the Strongly 
Connected Components (SCC) of the relation-graph groups all elements which are not 
comparable in one SCC. The SCCs are computed by a DFS-based linear time graph 
algorithm [CLR90]. A directed graph Gs^. is constructed by replacing each SCC in G 
by one node representing this SCC. Thus, all cycles in G are eliminated. 

Let G = (V,E) be the graph that represents relation <, and a set of solutions and let Z 
be the set of SCCs in G, with Z=(Z,,...,Z^) and Z. = (Vj,Ej), 1 < i < r < |V|. Let G^ = 
(V 2 ,E^) be the graph where all SCCs Z. of G are replaced by a node representing the 
SCC. An edge (v^^v^j), 1 < i,j < r, i j, is set in G^, if and only if there exists an edge 
(v^,V|) G E, 1 < k,l < |V|, k 1, with v^g Z, and v, g Z-, Z, ^ Z.. It directly follows: 

Lemma 4.1: The directed graph G^ has no cycles. 

The relation that is described by the relation graph G^ is denoted by Since G^ is 
acyclic, it is possible to determine a level sorting of the nodes. 

For each node in G^ we define a SC. Level sorting of the nodes in G^ determines the 
ranking of the SCs; each level contains at least one node of G^. Then each level 
corresponds to exactly one SC. Using the level sorting it is possible to group nodes 
(set of solutions) that are not connected by an edge in the same SC. These solutions 
are not comparable with respect to relation -<, and thus they should be ranked in the 
same level of quality. There are two possibilities to perform the level sorting: 

1. starting with nodes v that have indegree(v) = 0 or 

2. starting with nodes v that have outdegree(v) = 0. 

In the following the first strategy is used, since the best solutions that are not 
comparable should be placed in the same SC. 

Example 4.2: In Figure 4.1 b) SCC 1, SCC 2, and SCC 3 are illustrated. As can be 
seen the elements of SCC 1 and SCC 2 are superior to the elements of SCC 3. Figure 
4. 1 c) shows the relation graph G^ after level sorting. Level 1 corresponds to SC 1 and 
level 2 corresponds to SC 2. 

In the following some properties of relation /a vowr are presented. For the case n = 2 it 
holds: 

Lemma 4.2: Let f, : II — ^ Relations and -<, are equal, i.e. for x, y g II it holds: 

X <a y X y 

Proof: Using Definition 3.1 with n = 2 it follows: Let i (j) g N„ be the number of 
components of x (y) that are smaller than the corresponding components of y (x). 
Then for x y and i and j it holds j < i < 2 and i -i- j < 2. Then for i and j it follows 
that i = 1 and j = 0 or i = 2 and j = 0. 
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I£x y, it follows i = 2 and j = 0 or i = 1 and j = 0 x y 



Case 1: i = 1, j = 0 

It follows: (Xj < y,) a (x^ = y^) => x y 
Case 2: i = 2, j = 0 

It follows: (X| < yj) a (x^ < yj => x y 

Lemma 4.2 does not hold for n > 2 as can be stated by a counterexample: 

Example 4.3: Consider the vectors given in Example 4.1. Applying relation dominate 
vectors (1,1,2), (2,1,1) and (7,0,9) are not dominated by any other element; thus they 
are ranked as the best elements of the given vector set. In comparison to relation 
favour only the two vectors (1,1,2) and (2,1,1) are the best elements. 

For the computation of the execution time of the SCs an approximation can be given: 

Theorem 4.1: The computation time for the SC classification is 0(|Pp n). 

Proof: Each element in a population P is compared to all other elements in P and in 
each comparison n components are tested. The number of nodes in the relation graph 
is |P| and the operations on have linear time complexity. Thus the computation 
time for the classification is 0(|Pp-n). 

First, we mention two points that should be considered when using the model. These 
points can be seen as disadvantages, if compared to other approaches. 

1. Incomparable solutions may be placed in different SCs, even though there might be 
cases where the user want them to be in the same class. 

2. The different optimisation goals are considered in parallel and not relative to each 
other (see also discussion below). 

But, the model has several advantages over other approaches: 

• No problem specific knowledge is needed for the choice of the weights of the 
fitness function, like in the case of weighted sum. 

• Even if the components of a solution have different measures, a scaling of these 
components is not needed, since there are no distances between the solutions 
computed (like e.g. using a weighted sum or information about distances between 
solutions). 

• Dependent on the population the model dynamically adapts the relation graph that 
performs the comparison of the fitness function. In each generation the relation 
graph is re-computed and the ranking is updated. (This is done totally automatic 
and neither user interaction is required nor limits between the SCs have to be 
specified.) Thus, the granularity of the SCs is dynamically adapted to present 
conditions. 

• Relation -<, can also handle infeasible solutions. An infeasible component of the 
solution is considered as the worst possible value. 
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• If the structure of the search space changes during the EA run these changes are 
directly included in the relation that is updated online. 

• Experiments have shown that the model results in a finer granularity than relation 
dominate. 

• Due to the efficient representation based on graphs the run times are very low. 

• Handling of priorities of all (or some) objectives and of infeasible solutions is fully 
supported. 

• No user interaction is required. This is very important in complex applications 
where the user cannot keep control of all parameters by hand. 

In summary, there exist many advantages of the model compared to the “standard” 
techniques, which is demonstrated in the experiments. 

In the following some further scenarios are discussed that can occur in practice. 

4.2 Sorting Using Priorities 

So far in the model it is not possible to handle priorities, e.g. to give a priority to the 
different optimisation criteria. This is not desired in most cases of complex 
applications, but we now outline how it can easily be added to the model as defined 
above. Let pr g [0,1]” be the priority vector, where for each x, g H, pr denotes the 
priority of Xj. The lower pr is chosen, the higher is the priority of the corresponding 
component. If all values of pr’s are different, a lexicographic sorting can be used. 

Components with same priority value are compared using relation . Only for these 
components a ranking is computed. Then based on this sorted set of elements a final 
ranking is determined by lexicographic sorting. 

4.3 Invalid Solutions 

The model is very well suited to handle invalid solutions. The classification, when an 
element is to be seen as “invalid” has to be defined by the user. The main idea is to 
modify the comparison operator. The invalid elements (for one or some specific 
components) always lose all comparisons carried out during the evaluation. I.e. if 
component x^, 1 < i < n, of vector x g R"^ is invalid then it holds x > y for all valid y g 
n. This approach has been shown to work very well in practice. 



4.4 Relation to Weighted Sum 

Finally, we want to briefly comment on the relation of the model to the weighted sum 
approach, that is still used in many applications. The new model is not a cover of the 
weighted sum, i.e. the weighted sum cannot be embedded. The difference is that using 
the presented model the solutions are ranked relatively to each other. Using the 
weighted sum the absolute values of each component are combined to determine a 
single fitness value. This is demonstrated by constructing an example: 
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Example 4.4: Assume that the function f : R ^ with fj(x) = x and f^Cx) = (x-2)^ is 
given. If CO = (0.5, 0.5) is the weight vector for the weighted sum, the minimal solution 
is X = 1 with f(x) = (1,1) and the corresponding weighted sum is 0.51%0.5 (l-2)^= 1. 
It can be seen that for all other x values the sum is larger, e.g. the weighted sum of 
solutions X = 0 with f(x) = (0,4) and x = 2 with f(x) = (4,0) is 0.5-0%0.5-(0-2)^= 2 and 
0.5-2%0.5 (2-2)^= 2 respectively. Using the above presented model it is not possible to 
perform a ranking such that (1,1) is better than solution (0,4) or (4,0). These three 
solutions are not comparable by relation favour. 

5 Case Study: Heuristic Learning 

Analogously to [DDB99] for carrying out experiments we choose the application 
“heuristic learning” of Binary Decision Diagram (BDD) [Bry86] variable orderings as 
proposed in [DB95,DGB96]. This application has been chosen due to several reasons: 

• The problem has a high practical relevance and finds many applications in VLSI 
CAD. 

• Due to the many studies already carried out, the problem is well understood. This 
allows for clear interpretation of the results. 

• The problem is multi-objective in nature, i.e. quality (counted in number of nodes) 
and run time has to be optimised. 

To make the paper self-contained we review the main ideas from [DDB99]. 



5.1 The Learning Model 

First, the basic learning model is briefly explained (for more details see 
[DB95,Dre00]). It is assumed that the problem to be solved has the following 
property: 

A non empty set of optimisation procedures can be applied to a given (non-optimal) 
solution in order to further improve its quality. These procedures are called Basic 
Optimisation Modules (BOMs). Each heuristic is a sequence of BOMs. The length of 
the sequence depends on the problem instances that are considered. The goal of the 
approach is to determine a good (or even optimal) sequence of BOMs such that the 
overall results obtained by the heuristic are improved. 

The set of BOMs defines the set H of all possible heuristics that are applicable to the 
problem to be solved in the given environment. H may include problem specific 
heuristics, local search operators but can also include some random operators. 

The individuals of our EA make use of multi-valued strings. The sequences of 
different length are modelled by a variable size representation of the individuals. 

To each BOM h g H we associate a cost function denoted by cost : H ^ R. cost 
estimates the resources that are needed for a heuristic, e.g. execution time of the 
heuristics. The quality fitness measures the quality of the heuristic that is applied to a 
given example. 
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Binary Decision Diagrams 

As well-known each Boolean function f : B” B can be represented by a Binary 
Decision Diagram (BDD), i.e. a directed acyclic graph where a Shannon 
decomposition is carried out in each node. We make use of reduced and ordered 
BDDs [Bry86]. 

We now consider the following problem that will be solved using EAs: 

How can we develop a good heuristic to determine variable orderings for a 
BDD representing a given Boolean function f such that the number of nodes 
in the BDD is minimised? 

Notice we do not optimise BDDs by EAs directly. Instead we optimise the heuristic 
that is applied to BDD minimisation. 

Dynamic Variable Ordering 

The algorithms that are used as BOMs in the EA are based on dynamic variable 
ordering. Sifting (S) is a local search operation for variable ordering of BDDs which 
allows hill climbing. Sift light (L) is a restricted form of sifting, where hill climbing is 
not allowed. The third BOM is called inversion (I) which inverts the variable ordering 
of a BDD. For more details see [DGB96,Dre00]. 



5.2 Evolutionary Algorithm 

The configuration of the EA is described in the following sections. 

Representation 

In our application we use a multi-valued encoding, such that the problem can easily be 
formulated. Each position in a string represents an application of a BOM. Thus, a 
string represents a sequence of BOMs which corresponds to a heuristic. The size of 
the string has an upper limit of size which is given by the designer and limits the 
maximum running time of the resulting heuristics. In the following we consider four- 
valued vectors: S (L, I) represents sifting (sift light, inversion) and value N denotes no 
operation. It is used to model the variable size of the heuristic. 

Objective Function 

As an objective function that measures the fitness of each element we apply the 
heuristics to k benchmark functions in a training set T = {bj, . . . ,b^}. The quality of an 
individual is calculated by constructing the BDD and counting the number of nodes 
for each bj, 1 < i < k. Additionally, the execution time (measured in CPU seconds) that 
is used for the application of the newly generated heuristic is minimised. Then, the 
objective function is a vector of length k-tl and is given by: 

ffT)={#nodes(hf ... , #nodes{h^), time(T)), 
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where #nodes(h), 1 < i < k, denotes the number of BDD nodes that are needed to 
represent function bj. The execution time that is needed to optimise all functions of the 
considered training set T is denoted by time. 

The choice of the fitness function largely influences the optimisation procedure. It is 
also possible to chose a fitness function as a vector of length 2Dk by considering the 
execution time for each benchmark b. separately instead of using the sum. By our 
choice the EA focuses more on quality of the result than on the run time needed. 

Algorithm 

1. The initial population P is randomly generated. 

2. Then P/2 elements are generated by the genetic operators reproduction and 
crossover. The newly created elements are then mutated. 

3. The offspring is evaluated. Then the new population is sorted using relation /a vonr. 

4. If no improvement is obtained for 50 generations the algorithm stops. 

For more details about the algorithm see [DGB96]. 

6 Experimental Results 

All experiments in the following are carried out on a SUN Sparc 20 using the 
benchmark set from [Yan91]. 

First, the proposed model is compared to the weighted sum approach, because this is a 
“classical” method and used in many applications. Then a comparison to the “pure” 
Pareto-based approach from [Gol89] is performed. Notice, that comparisons to other 
approaches, as presented in [ZT99], are not given, because there the users’ interaction 
is required, e.g. if distances between solutions are computed. 

6.1 Comparison to Weighted Sum 

In a first series of experiments we applied heuristic learning to a set of 5 benchmarks. 
Thus 6 objectives (5 benchmarks and the execution time) are optimised in parallel. 
First, only small benchmarks were studied, for which the optimal result is known. We 
applied an exact algorithm, an iterated greedy heuristic, learning based on weighted 
sum, and learning based on relation /avonr. The results are summarised in Table 6.1. 
Column name denotes the considered function, in (out) shows the number of inputs 
(outputs) of the function and in column optimal the minimal BDD size is shown. The 
applied heuristics for BDD minimisation are shown in the following columns: sift is 
an iterated greedy heuristic, sum denotes the heuristic resulting from the EA with 
weighted sum and fl and/2 are the two alternative heuristics resulting from the EA 
using relation /a vowr for ranking the individuals. 

Regarding quality of course the exact algorithm determines the best results, but the 
run time is more then 100 CPU seconds. Iterated greedy is in some cases more than 
30% worse than the exact result. Both learning approaches determine the same 
(optimal) quality, but the run time of the heuristic constructed by relation favour is 
nearly two times faster. 
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Table 6.1 



name 


in 


out 


optimal 


sift 


sum 


f1 


f2 


add6 


12 


7 


28 


54 


28 


28 


28 


addm4 


9 


8 


163 


163 


163 


163 


163 


cm 85a 


11 


3 


27 


35 


27 


27 


30 


m181 


15 


9 


54 


56 


54 


54 


54 


rise 


8 


31 


65 


65 


65 


65 


65 


average time 




> 100 


0,4 


1,6 


0,9 


0,4 



Table 6.2 



name 


in 


out 


sift 


sum 


favour 


bcO 


26 


11 


522 


522 


522 


chkn 


29 


7 


261 


266 


257 


frgi 


28 


3 


84 


80 


72 


ibm 


48 


17 


410 


216 


354 


mis] 


35 


14 


39 


39 


39 


x6dn 


39 


5 


241 


240 


229 


average time 


0,9 


1,0 


3,5 



In a second series larger benchmarks are considered for which the exact result cannot 
be computed. The application to larger examples with 7 objectives to be optimised 
showed very good results. They are summarised in Table 6.2. In this case (as can be 
expected) the greedy approach performs even worse. Relation /avowr outperforms the 
weighted sum on 50% of all examples considered. 

It is important to notice that for the weighted sum approach a lot of “fine-tuning” was 
necessary, while relation favour runs without any further user interaction. 



6.2 Comparison to Relation Dominate 

We first applied the learning algorithms to the same (small) training set as above 
(Table 6.3). After the optimisation the learning algorithm based on dominate 
computes 14 different solutions that cannot be further compared. Relation favour 
returned two elements only (see also Table 6.1). Regarding quality, there was not a 
single component where one of the 14 elements outperformed one of the two. Beside 
this the learning time for the algorithm based on favour was more than four times 
faster, i.e. 2.5 CPU hours instead of 14. 

When applying heuristic learning to construct heuristics in VLSI CAD it is very 
important that the number of final solutions is not too large, since finally the designer 
has to choose one. If the list becomes too long, it is not feasible to test them all, since 
the designs are too large. 
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Table 6. 3 



name 


in 


out 


d1 


d2 


d3 


d4 


d5 


d6 


d7 


d8 


d9 


dIO 


dll 


CM 


CO 


d14 


f1 


f2 


add6 


12 


7 


42 


63 


30 


62 


260 


256 


300 


185 


256 


132 


52 


67 


51 


310 


28 


28 


addm4 


9 


8 


163 


191 


163 


187 


245 


198 


189 


181 


198 


231 


201 


167 


163 


206 


163 


163 


cm85a 


11 


3 


33 


39 


35 


30 


37 


37 


43 


39 


46 


49 


30 


35 


33 


37 


27 


30 


m181 


15 


9 


60 


74 


54 


55 


87 


87 


84 


74 


80 


83 


54 


60 


61 


86 


54 


54 


rise 


8 


31 


70 


71 


65 


66 


94 


79 


84 


95 


97 


94 


65 


65 


65 


90 


65 


65 


average time 


1,9 


1,8 


2,5 


1,9 


1,2 


1,2 


1,2 


1,2 


1,2 


1,2 


2,7 


1,9 


3,3 


1,2 


0,9 


0,4 



To further study the selection process, we finally applied our technique to a larger 
training set, where the algorithm based on relation dominate computed 16 solutions. 
To this result we applied the relation favour and this reduced the set to one element. 
More details are given in [DreOO] . 



6.3 Further Applications 

In the meantime, the MOO model based on relation favour has been used in many 
projects and has been included in the software environment GAME (Genetic 
Algorithm Managing Environment) - a software tool developed for applications in 
VLSI CAD using evolutionary techniques. 

A method for test pattern generation for digital circuits and a tool for detailed routing 
of channels and switchboxes has been developed underlining the flexibility of the 
model (see [Dre98]). 

7 Conclusion 

Recently, in [DDE 99] a new model for Multi-Objective Optimisation has been 
proposed that overcomes the limitations of classical EA approaches that often require 
several runs to determine good starting parameters. Eurthermore, the model gives very 
robust results since the number of parameters is reduced without reducing the quality 
of the result. Only “non-promising'” candidates (for which can be guaranteed that they 
are not optimal and already better individuals exist) are not considered. 

In this paper, we gave a detailed description of the model. Several properties have 
been discussed and advantages and disadvantages are described. We especially 
compared from a theoretical and practical point of view the relation of the new model 
to the weighted sum approach and to the relation dominate, respectively. 

As an advantage it turned out that components with different measures do not have to 
be scaled. This is done automatically comparing elements with relation /avowr. This 
may result in significant speed-ups in EAs and simplifies the handling of the 
algorithms. 
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Abstract. We compare two multiobjective evolutionary algorithms, 
with deterministic gradient based optimization methods for the dose op- 
timization problem in high-dose rate (HDR) brachytherapy. The opti- 
mization considers up to 300 parameters. The objectives are expressed 
in terms of statistical parameters, from dose distributions. These param- 
eters are approximated from dose values from a small number of points. 
For these objectives it is known that the deterministic algorithms con- 
verge to the global Pareto front. The evolutionary algorithms produce 
only local Pareto-optimal fronts. The performance of the multiobjective 
evolutionary algorithms is improved if a small part of the population is 
initialized with solutions from deterministic algorithms. An explanation 
is that only a very small part of the search space is close to the global 
Pareto front. We estimate the performance of the algorithms in some 
cases in terms of probability compared to a random optimum search 
method. 



1 Introduction 

High dose rate (HDR) brachytherapy is a treatment method for cancer where 
empty catheters are inserted within the cancer volume. Once the correct position 
of these catheters is verified, a single ^®^Ir source is moved inside the catheters 
at discrete positions (dwell positions) using a computer controlled machine. The 
problem which we consider is the determination of the n times (dwell times) 
which are sometimes termed dwell weights or weights for which the source is at 
rest at each of the n dwell positions so that the resulting three-dimensional dose 
distribution will fulfill defined quality criteria. In modern brachytherapy the dose 
distribution has to be evaluated with reference to irradiated normal tissues and 
the planning target volume (PTV), which includes the cancer volume and an 
additional margin. For a more detailed description see P. P The number of 
source positions varies from 20 to 300. We consider the optimization of the dose 
distribution using as objectives the variance of the dose distribution on the PTV 
surface and in the PTV calculated at 1500 — 4000 positions (sampling points) . 
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For variances and in general for quadratic convex objective functions f{x) of the 
form 



f{x) = {Ax — df' {Ax — d) (1) 

it is known that a weighted sum optimization method converges to the global 
Pareto front |^, where A is a constant matrix and d is a constant vector of the 
prescribed dose values in the PTV target or surface. 

We have successfully applied multiobjective evolutionary algorithm with 
dose- volume based objectives P,H- In the past comparisons of the effective- 
ness of evolutionary algorithms have been made with either other evolutionary 
algorithms ^ or with manually optimized plans 0, 0. We have compared 
the Pareto fronts obtained by multiobjective evolutionary algorithms with the 
Pareto fronts obtained by a weighted sum approach using deterministic optimiza- 
tion methods such as quasi-Newton algorithms and Powells modified conjugate 
gradient algorithm which does not requires derivatives of the objective function 
0. 

We use here only objectives where gradient based algorithms are superior. 
However, we must consider also critical structures partly inside the target or 
close to it which have to be protected by excessive radiation. Other objectives 
are the optimum position and the minimum number of sources. In such cases 
the gradient based algorithms can not be used. Therefore before applying evo- 
lutionary algorithms for the solution of these more complex problems we have 
compared their efficiency with deterministic methods using only as objectives 
the dose variance within the target and on its surface. 



2 Methods 



2.1 Objectives 



We use as objectives the normalized variance of the dose distribution on the 
PTV surface fs and in the PTV fv- 



fs 



fv 



Ns 



m^gNs 



'^{d^ - msf 



2=1 



Nv 



2 Ar ^ 'f dj niy ) 

m{,Nv ^ 



( 2 ) 

( 3 ) 



’mst'm-v are the corresponding mean values, Ng,Nv the number of points 
used to estimate these parameters and di the dose value of the i-th point. 

We use a weighted sum approach for the multiobjective optimization with the 
deterministic algorithms, where for a set of weights for the volume and surface 
variance we perform a single objective optimization of fw- 



fw = wsfs + wvfv 



( 4 ) 
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where ws,wv > 0 are the surface and volume importance factors, respec- 
tively and ws + vjv = 1- We used 21 optimization runs where ws varied from 0 
to 1 in steps of 0.05 to determine the shape of the trade-off curve. 

2.2 Genetic Operators 

We use a real representation for the chromosomes. The following variants of 
genetic operators have been used in this study. 

Uniform mutation. In uniform mutation if is the element of a chromo- 
some selected for mutation, then it is replaced by a random number from the 
interval [LB, UB] where LB and UB are the lower and upper bounds of the 
element. 

Non-uniform mutation. In non-uniform mutation if gk is the element of a 
chromosome at generation t it will be transformed after a non-uniform mutation 



where ri is a random bit (0 or 1), r is a random number in the range [0, 1], T 
is the maximal generation number and b a parameter controlling the dependency 
of A(t, y) on the generation number. The function A(t, y) returns a value in the 
range [0,y] such that the probability of A{t,y) being close to 0 increases as 
t increases. Initially when t is small the space is searched uniformly and very 
locally at later stages |^. 

Flip Mutation. In the case of Flip mutation, we randomly pick elements 
(genes) gk to gj from the chromosome and then set those elements to any other 
of genes of this chromosome. This will work for any number of gene sets of a 
given chromosome. 

Swap Mutation. In the case of swap mutation, we randomly swap the elements 
in the chromosome. The number of swapped elements depends on the mutation 
probability. 




9k = 



gk + A{t,UB - gk) ri = 0 
gk + A{t,gk-UB) else 



( 5 ) 






( 6 ) 



Gaussian Mutation. This mutation operator adds a gaussian distributed ran- 
dom value to the chosen gene. The new value is clipped if it falls outside the 
specified lower and upper bounds for that gene. 



170 



N. Milickovic et al. 



Blend Crossover. In the case of Blend crossover we generate a uniform dis- 
tribution based on the distance between parent values, then choose the offspring 
value based upon that distribution. This distance is defined for each pair of the 
parents corresponding genes. If gi and 52 are parent chromosomes, and g[ and 
^2 are the offsprings then for the parents’ i-th genes, this distance is given by: 

dist =\ gii - g2i \ ( 7 ) 

Further, lower and higher bounds, lo, hi are found as: 

lo = min(5ii, g2i) — 0.5 • dist, (8) 

hi = max(5ii, g2i) + 0.5 • dist, 

and correcting those values if necessary to retain within the wanted gene- value 
boundaries. Then and are determined as the random numbers from the 
range [lo,hi]. 

Geometric Crossover. In geometric crossover if a and b are parent chromo- 
somes then the new offspring ci and C2 are: 

Cl — Q/i, CI2 5 (^3 ; ■ ■ ■ : 1 \/ ; ^n-t-1 ; • ■ • ; \/ ^2n 7 ^2n ^ (9) 

C2 = < bi, 62, 63, ..., a„, \/ a„+i, 5 „+i, ..., \/ 02^, ^2n > • 

One of these progenies will be selected according to their fitness values. 



Two Point Crossover. The two point crossover operator randomly selects two 
crossover points within a chromosome. Further, it interchanges the two parent 
chromosomes between these points (segment interchange) to produce the two 
new offsprings. 



Arithmetic Crossover. If g* and g* are two chromosomes of the population 
at generation t, then after arithmetic crossover two new chromosomes 
at generation t -|- 1 are produced: 

= ag* + (1 - a)g* (10) 

5*+^ = ag* + {l- a)g] 

a is either a constant (uniform arithmetical crossover), or a variable depending 
on the generation number (non-uniform arithmetical crossover). 

2.3 Estimation of the Probabilities of Random Solutions 

Using random sets of decision variables (weights) we extrapolated the number 
of function evaluations required by a random search method to obtain points on 
the Pareto front. We have found that a two-dimensional normal density function 
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f{x) in some cases can be used to describe the distribution of these random 
points in the objectives space known also as bi-loss map for two objectives cni- 
The density function is: 



/(®) 










( 11 ) 



where 



r = 



f CTll CTi2 
0"22 



(12) 



with (Til = crl and ai 2 = (T 21 = cov{x,y) = E[{x - yx){y ~ My)] = P^^xCTy. 
E{x) is the expectation value of x, ax, ay are the variances in x and y respectively 
and p the correlation coefficient. 



3 Results 

The convergence of the Strength Pareto Evolutionary Approach algorithm 
(SPEA) is shown in Fig. Q For a breast implant with 250 sources the pop- 
ulation averages of the surface and volume dose variances < fs > and < fv > 
are shown for 500 generations. A convergence is observed after 100 generations. 




Fig. 1. An example of the convergence of < fs > and < fv > for the SPEA algorithm. 



172 



N. Milickovic et al. 



In order to compare the efficiency of SPEA and the deterministic algorithms 
we have evaluated the probability to obtain at random a solution presented by 
these algorithms. We generated 2500000 random sets of weights. The distribution 
of these points in search space for a cervix implant with 215 sources is shown in 
Fig.El We use Powells optimization method 0 to fit the parameters = 

1, 2 and /x of (II 211 to this distribution. The result is shown in Fig. |3 where the fit 
and the experimental value for bins with a non-zero number of entries is shown 
unfolded as a one dimensional distribution. 




Fig. 2. Probability P{fs, f ’-v) of a generation of a random point at /s±0.05, /v±0.05. 



In Fig. 0 we show the probability distribution for the cervix implants of 
obtaining a solution at a point fs ± 0.05, fv ± 0.05. The distributions is shown 
for 100000 random solutions and for the non-dominated solutions from SPEA 
and the deterministic algorithm. 

The distribution of weights obtained for the gradient based optimization 
for the cervix implant is shown in Fig. 0 Only a small part of the weights is 
significant. 

Fig0 like Fig0 shows the result for a implant with 250 sources where the 
global Pareto front is closer to the majority of the random solutions in the search 
space. Consequently the difference between probabilities of the solutions of the 
deterministic and the SPEA algorithm is smaller. 

In all cases which we studied the Pareto fronts of the multiobjective evolu- 
tionary algorithms MOEAs are local and between the global Pareto front found 



Comparison of Evolutionary and Deterministic Multiobjective Algorithms 173 



1.4 




1.2 




1.0 




0.8 




0.6 




0.4 


t 

* 


0.2 

0.0 


i 



* ’ 

; t* i 



- 2D fitted normal distribution 
Experimental 



** * * ]l 



/ - K 



• •• • 
• •• 



• « 



aAM/u u u tf to u tf ti.u u u 



0 



100 



200 300 

Index 



400 500 



Fig. 3. A fit of a two-dimensional normal distribution. 
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Fig. 4. Probability distribution for a cervix implant with 215 sources for 100000 ran- 
dom, SPEA and deterministic algorithm solutions. 
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Fig. 5. Distribution of the weights for the deterministic algorithm for the importance 
factors wv = 0.0 and 0.3, see0 




Fig. 6. Probability distribution for a breast implant with 250 sources for 100000 ran- 
dom, SPEA and deterministic algorithm solutions. 
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by the deterministic algorithms and the majority of random solutions in the 
search space. The shape of this set depends on the geometry of the implant and 
the topology of the sources. For implants with a large number of sources the 
majority of the random solutions is far from the the global Pareto front. 

K. Deb described some reasons why convergence to the true Pareto-optimal 
front may not occur: Multimodality, deception, isolated optimum and collateral 
noise For the objectives used multimodality can be excluded. The most 
important reason is the isolated minimum. Since only a extreme small part of 
the search space is located close to the true Pareto front it is not possible for the 
evolutionary algorithm to acquire information about the position of the Pareto 
front from crossover and mutation. As described in uni code-substrings that have 
above average fitness values are sampled at an exponential rate for inclusion in 
subsequent generations. This does however not imply that a convergence to the 
global Pareto- front will occur. In contrast the gradient based algorithms use 
very efficient the information from local gradients and converge extremely fast 
to the Pareto-optimal front. Collateral noise seems to be present when there 
is an excessive noise coming from other parts of the solution vector. Without 
initialization the population moves smoothly and converges to a local Pareto 
front. If a small part of the population is initialized with solutions from the 
deterministic algorithm then members of the population cover a much larger 
part of the search space. This shows that the solutions require a extreme fine 
tuning of the decision variables which the conventional genetic operators used 
in this study can not handle. 

We analyzed the influence of different mutation and crossover operators, 
described previously in Methods, on the efficiency of different evolutionary al- 
gorithms. In all cases the parameters of 0.85 for the cross-over probability and 
0.0065 for the mutation probability are used. 

The best coverage of objective space was obtained by using the geometric 
crossover. The mutation operator did not influence the evolutionary algorithm 
efficiency as much as the crossover. 

In order to compare the effectiveness of the evolutionary algorithms we gen- 
erated 100000 points in the objective space from a corresponding number of ran- 
dom sets of weights. The efficiency of evolutionary algorithms highly depends on 
how far these random solutions are from the global Pareto front. In Fig. Qthe 
case of a prostate implant with 20 sources is presented. We can conclude that 
the random solutions cover the objective space very satisfactorily and approach 
the Pareto front. This means that the evolutionary algorithms should be able 
to produce Pareto sets which will converge toward the deterministic solutions. 
In different case, see Fig. 0 evolutionary algorithms could only theoretically 
approach the deterministic Pareto set after a extreme large number of genera- 
tions. The Pareto set obtained by the evolutionary algorithms converges toward 
the global Pareto set better in the direction in which the majority of random 
solutions is nearer to the global Pareto set. 

In the case that the majority of random solutions in objective space is far 
from the Pareto global front, then the evolutionary algorithms produce only local 
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Pareto sets. An optimal set of parameters and genetic operators does not improve 
the convergence to the global Pareto set as significantly as an initialization with 
solutions from the deterministic algorithm. With only four deterministic solu- 
tions, see Figs. 0 and 0 the evolutionary algorithms reproduce the Pareto global 
front. We have found that the objective functions used in deterministic algo- 
rithms for this initialization need not to be exactly the same as used by the 
evolutionary algorithms. For example the avoidance of high dose values inside 
the PTV can be satisfied by a small dose variance inside the PTV. Another 
objective function which can be used is for this purpose is a penalty function 
which penalizes solutions with dose values above a given limit. In this case the 
deterministic gradient based algorithms due to the presence of local minima can 
not be applied. 

We analyzed the efficiency of different evolutionary algorithms compared 
with deterministic ones. The Pareto sets produced by SPEA, FFGA, FFGA with 
elitism and the niched Pareto algorithm (NPGA) m, and the Pareto 

set evaluated by the Fletcher deterministic method, are compared. Pareto fronts 
from SPEA and FFGA with elitism are converging to the deterministic eval- 
uated Pareto front much better than FFGA and NPGA, see Fig. 0 The main 
reason for this is the elitism implemented in the former two methods. In the case 
of multiobjective evolutionary algorithms it is important to save the nondomi- 
nated solutions. Even in the case that the previously described initialization is 
applied, the non-elitistic algorithms do not produce better results as the algo- 
rithm does not ’’remember” the external nondominated set. This means that 
an initialization of evolutionary algorithms requires the inclusion of elitism. For 
NPGA and FFGA an additional problem in comparison to SPEA is that a value 
for the sharing radius is required which can vary from case to case. 

The problem of the SPEA algorithm is that the extension of the Pareto front 
is not very large as FFGA. It does not cover the ’’ends”, see Figs. 0 and0 as 
the extension requires a fine tuning of the weights which can not be reached by 
the evolutionary algorithms, except if an additional initialization was made first. 
We usually run algorithms for up to 500 generations although the population 
converges to a local Pareto front as shown in Fig. El after 100 generations. The 
Pareto front is not significant modified even if the calculations has been extended 
up to 10000 generations in some cases. For implants with 200 and more sources, 
the computation time with 10000 generations requires up to 10 hours. We solved 
this problem by initialization of four members of the initial population with so- 
lutions from deterministic algorithm. In this case, the algorithm converges to 
the deterministic evaluated Pareto set in less than 100 generations. Even if we 
could use for other objectives a weighted sum approach with deterministic algo- 
rithms, this would require to use a very large number of weights to reconstruct 
the Pareto set, especially with increasing number of objectives. Here the evolu- 
tionary algorithms in combination with deterministic methods are more effective 
in generating non dominated solutions. 
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Fig. 7. An example of the distribution of 100000 random solutions in objective space 
for a prostate implant with 20 sources. The Pareto set obtained by the deterministic 
algorithm is presented by the line. The non dominated solutions of SPEA and FFGA 
with elitism are shown without initialization. For SPEA additional the Pareto set is 
show if four members of the population are initialized with solutions from a determin- 
istic algorithm. 



4 Conclusions 

We have compared standard multiobjective evolutionary algorithms with deter- 
ministic optimization methods with objectives such that a weighted sum ap- 
proach can be used to obtain the global Pareto fronts. This comparison was 
done for the dose-optimization problem in HDR brachytherapy. The number of 
decision variables can be as high as 300. 

From the distribution of random generated weights we were able to estimate 
in some cases the probabilities of generating randomly a point in the objective 
space. This enabled us to estimate the performance of the multiobjective evolu- 
tionary and the deterministic algorithms in comparison to a random search. The 
evolutionary algorithms have been found to be a factor 10® — 10^^ more effective 
than a random search. The deterministic are more efficient which exceeds in 
some cases 10®°. This could explain why evolutionary algorithms were not able 
to generate solutions close to the global Pareto front. 

The Pareto front reached depends on the probability of generating a point 
in the objective space. The evolutionary algorithms with the standard genetic 
operators described in this work are not able to significantly improve the per- 
formance. An initialization from deterministic algorithms improves the perfor- 
mance and helps to reconstruct the Pareto front around the initial seeds of the 
deterministic algorithm. 
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Fig. 8. An example of the SPEA algorithm for the case of a breast implant with 250 
sources. The majority of random solutions in the objective space is very far from the 
global Pareto front. Without initialization the evolutionary algorithms do not reach the 
Pareto front. The path of the deterministic algorithm is shown for a set of fixed weights. 

Our previous results of evolutionary algorithms with objectives where gradi- 
ent based deterministic algorithms can not be used showed that the results were 
compatible or even better than by other phenomenological dose-optimization 
methods p. With this study we have found that if a part of the population is 
initialized with a good initial estimate of the Pareto front, that the results of the 
evolutionary algorithms improve significant more better than by any optimiza- 
tion of the GA parameters. 

We do not know if there are special genetic operations which if applied could 
fill the gap between the Pareto fronts found by the evolutionary and determin- 
istic algorithms. It seems that the global solutions requires a fine tuning of the 
decision variables which is far beyond what evolutionary algorithms can achieve 
in an acceptable number of generations. If a part of the population reaches via 
initialization a part of this region then the evolutionary algorithm is able to find 
solutions around this region. 

A weakness of the MO evolutionary algorithms is the large number of func- 
tion evaluations required to obtain a reasonable good local Pareto front. In the 
past MOEAs were compared for problems involving only very few decision vari- 
ables. In such cases a random set of a few thousand sets of decision variables 
covers a large part of the objective space and evolutionary algorithms are able 
to produce fast solutions very close to the global Pareto front. The population 
size increases the efficiency of the MOEA algorithms by generating solutions 
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Fig. 9. An example of the SPEA algorithm for the case of an breast implant with 250 
sources. The convergence in the objective space is presented at different number of 
generations t. 



closer to the global Pareto set although population sizes of more than few hun- 
dred are not very practical and useful. In problems like in brachytherapy with 
up to 300 decision variables points close to the global Pareto front are in some 
cases of extreme low probability. In this case the evolutionary algorithms can 
not reach these regions whereas deterministic algorithms using information from 
local gradients are guided into these regions after only 10-20 iterations. In each 
such iteration a few times a line minimization is performed. 

We have considered here only a simple coupling of the deterministic algorithm 
which delivers some optimal solutions as a starting point. We have found that the 
performance of the MOEA algorithms is significantly enhanced even if the initial 
solutions are obtained using different objectives for the deterministic algorithm. 
In the past algorithms were proposed where the evolutionary algorithms produce 
starting points for deterministic algorithms. Another possibility is to use a hybrid 
version of the evolutionary and deterministic algorithms. There are version where 
a hill-climbing operator is applied, with some probability which could be adapted 
to the performance of the algorithm. 

We have estimated in some cases the performance of the MOEAs from Monte 
Carlo sampling experiments. We will in the future consider other approaches such 
as simulated multiobjective annealing where an external population is filled with 
the nondominated solutions found. This population is used in the optimization 
process by picking random members as starting points. We have to compare if 
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such an algorithm with a single member can produce better results than MOEAs 
where it is assumed that the performance can be explained by mechanisms such 
as the implicit parallelism. 
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Abstract. This paper studies the influence of what are recognized as key issues 
in evolutionary multi-objective optimization: archiving (to keep track of the cur- 
rent non-dominated solutions), elitism (to let the archived solutions take part in 
the search process), and diversity maintenance (through density dependent se- 
lection). Many proposed algorithms use these concepts in different ways, but a 
common framework does not exist yet. Here, we extend a unified model for multi- 
objective evolutionary algorithms so that each specific method can be expressed as 
an instance of a generic operator. This model forms the basis for a new type of em- 
pirical investigation regarding the effects of certain operators and parameters on 
the performance of the search process. The experiments of this study indicate that 
interactions between operators as well as between standard parameters (like the 
mutation intensity) cannot be neglected. The results lead not only to better insight 
into the working principle of multi-objective evolutionary algorithms but also to 
design recommendations that can help possible users in including the essential 
features into their own algorithms in a modular fashion. 



1 Introduction 

Evolutionary algorithms have shown to be a useful auxiliary tool for approximating the 
Pareto set of multi-objective optimization problems. Several surveys of evolutionary 
multi-objective algorithms can be found in the literature, e.g., mm, which reflects 
the large number of different evolutionary approaches to multi-objective optimization 
proposed to date. While most of these algorithms were designed with regard to two 
common goals, fast and reliable convergence to the Pareto set and a good distribution of 
solutions along the front, virtually each algorithm represents a unique combination of 
specific techniques to achieve these goals. 

As a consequence, there are no common guidelines available, how to best tailor an 
evolutionary algorithm to an application involving multiple objectives. Recently, some 
researchers have tried to address this problem by carrying out extensive comparative 
case studies, which can roughly be divided into two different categories. The first group 
compares different algorithms III 7181191 1 8^201 . but as the algorithms usually differ in 
more that just one aspect, it is very difficult to identify the features which are mainly 
responsible for the better performance of one algorithm over another. On the contrary, 
a few other studies take one algorithm and focus on a specihc operator or parameter to 
tune, e.g. the selection method EMI- In this case the results are valid for the algorithm 
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under concern and highly dependent on the other algorithmic parameters. Hence, it has 
remained open up to now 

- how a certain parameter or a certain operator affects the overall performance inde- 
pendent of the specihc implementation and the other techniques used, and 

- how the parameters and operators influence each others’ performance. 

The reason why these questions cannot be answered easily is the vast number of details 
by which the various implementations differ. To overcome this difficulty, in a first step 
we have developed a unified model representing a general structure that most algorithms 
have in common; this allows to assess important algorithmic features by separating them 
from the implementations-specific details I3|. Based on this model, we here investigate 
archiving, elitism, and density estimation methods and present an appropriate statistical 
methodology to identify significant effects and interactions, which will be carried out 
for the example of the knapsack problem. 

Section |3 starts with the classification of the key concepts in evolutionary multi- 
objective optimization and the description of the model. The experimental design is 
explained in sectionOl In the remaining parts, experiments focus on the key parameters 
and operators in multi-objective evolutionary algorithms: Section^ldescribes the notion 
of elitism and investigates the benehts of elitism and the interactions with mutation. In 
section 0we deal with archiving methods and different possibilities to limit the size of 
the archive. In section|^we investigate different density estimation techniques and how 
they can be used to guide the selection of parents for the next generation. Finally, we 
discuss the results and their implications for the design of multi-objective evolutionary 
algorithms. 

2 Background 

2.1 Issues in Evolutionary Multi-objective Optimization 

Fig. HI gives an overview on the different techniques that are applied in most multi- 
objective evolutionary algorithms. Though some are also used in single-objective opti- 
mization (e.g. fitness sharing), most of the features are especially devoted to the task of 
maintaining a diverse set of solutions in an multi-dimensional objective space. 

Many multi-objective evolutionary algorithms maintain a secondary population - or 
archive - parallel to the normal offspring population IIT/I pp. 3-26-3-28]. Generally, an 
archive can be used to store any kind of useful information gathered during the run, but 
in most cases it just contains non-dominated solutions and therefore approximates the 
Pareto set. If the archived solutions reproduce as well, we say the algorithm uses elitism. 
Recent studies suggest that the use of elitism improves multi-objective evolutionary 
algorithms 1 1011211 811 . 

Another issue is the assignment of fitness values to the individuals. In this study we 
concentrate on Pareto-based methods because of their acknowledged advantages over 
aggregation and population-based methods Different techniques inferring a scalar 
value from the partially ordered objective vectors include the dominance level (or non- 
dominated sorting Ifitl.'jI 'l. the dominance grade or rank 0| and the ’strength’ measure 
iSOi . Some algorithms use a further means to reach a better or more uniform distribution 
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structure 



Archiving 
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Primary 

Fitness 



Secondary 

Fitness 



Fig. 1. A taxonomy of techniques in multi-objective evolutionary algorithms, the arrows indicate 
existing combinations. 



of solutions: Individuals receive a secondary fitness value, mainly based on the density 
of search points near them. This density value can then be used to modify the primary 
fitness, e.g. via fitness sharing. 

2.2 The Unified Model 

The framework we use in this study is based on the Unified Model for Multi-objective 
Evolutionary Algorithms (UMMEA [5i]), which is characterized by a decomposition of 
algorithms into abstract operators. Existing techniques can be classified, formalized, 
and mapped onto the operators. Hence, it is not only possible to emulate the behavior 
of specific algorithms, but also to combine the different techniques arbitrarily, which is 
important for our statistical analysis. 

The main concept of this model is the coexistence of a ’normal’ offspring population 
(B, |i?‘| = b) and an archive of ’elite’ individuals (A, |A*| < a), which represents 
the current approximation of the Pareto set. All modifications of these multi-sets of 
individuals can be expressed by (stochastic) operators that can be arbitrarily combined 
in a modular fashion. Eig.Elshows the generic algorithm based on this model (left) and 
a schematic view of the transition of one generation to the other (right). The initialize 
operator usually sets the archive to the empty set and to a random sample of the 
search space. If the archive size is bounded (1 < a < c»), a strategy must be defined to 
deal with more non-dominated solutions than can be stored. This is done by the truncate 
operator. 

The ’elitism intensity’ is modeled by the parameter pe, which represents the proba- 
bility to select a parent from the archive instead of the normal population. 

As long as the termination predicate is not satisfied, the archive and the population are 
evolved. In each iteration, the update operator updates the archive with individuals from 
the current offspring population; usually, only non-dominated solutions are stored. In the 
adapt operator, possible control mechanisms for the elitism intensity can be modeled, 
but in this study we leave pe constant during the run. 
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Fig. 2. Left: The general algorithm based on the unified model, A* denotes the archive, B* the 
population and pi the elitism intensity at generation t. Right: A schematic view of the transition 
from one generation to the other. 

The evaluate operator calculates the target sampling rates for each individual de- 
pending on the individuals in the archive and the population. For example, the primary 
fitness of individuals is derived from their rank with respect to their dominance level, 
usually referred to as ’non-dominated sorting’ From these ranks, the sampling rates 
of an individual 5 in a set of individuals (population) B are calculated to simulate binary 
tournament selection. 

According to these sampling rates, the sample operator then selects the required 
number of individuals from both the archive and the population as parents of the next 
generation. These parents are finally modified in fhe vary operator, where both recombi- 
nation and mutation take place. In this study we use use one-point crossover and constant 
mutation rates amut- 

3 Experimental Design 

Our aim is to investigate how the quality of the Pareto set approximation of the algorithms 
depends on certain algorithmic variables or parameters. This is a challenging task since 
the design space of the algorithm is huge and the performance indicators are noisy. In 
this study we pursue a novel approach which will be explained next, followed by a short 
descriptions of the test problem and the performance measure. 

3.1 Methodology 

In the design of an algorithm and specifically in fhe instances of our unified model, 
we face different types of design variables. In order to investigate the effect of these 
(independent) variables on the performance of the algorithm, we want to deal with all 
of them simultaneously in a common framework. For this purpose we distinguish 

- ordinal, continuous variables (like the elitism intensity pe) 

- ordinal, discrete variables (like the population size) 

- categorical variables (like different operators) 
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Categorical variables are mainly given by different operator instances, e. g. the archive 
limitation strategy in truncate. 

For each experiment, we first decide on the variable(s) whose influence is to be 
assessed, and fix all other variables to some standard or optimal value from previous 
experiments (if those are available). The free variables are then varied randomly in their 
respective domain, and from each setting one observation of the performance indicator 
is drawn, i. e. the EA run once with with each setting]] 

After a large number of data has been collected in this manner (usually 1000 obser- 
vations per experiment), we look for a linear model that yields a good description of the 
data. The general form of this model for the i-th observation is 

Vi = ao + ai- + 02 • + . . . + an ■ -f Ei (1) 

where . . . , a:^"^ are the explanatorial variables and Vi is the response (or target 
variable) - in our case the performance value. Ei are the random error caused by the 
randomness in the algorithm itselfQ 

The variables x^^\l < j < n of the model can be any transformation or com- 
bination of original variables. Thus, the model is only linear in the coefficients, and 
non-linear dependencies from the variables (which are very likely for complex systems 
like evolutionary algorithms) can easily be traced. It furthermore allows to include the 
categorical variables, which have no natural value or order. This is done by indicator 
variables, where one variable is used for each category. If an observation falls into a 
specific category, the respective indicator variable is set to one, otherwise to zero. The 
coefficient of this indicator variable then shows the difference in the response that is 
caused by this category. 

If only categorical variables were used, this method would be equivalent to analysis 
of variance (ANOVA), which has been applied to parameter studies of evolutionary 
algorithms by Schaffer et at. [li]. Here, we try to keep variables in their respective 
domain and want to use the order of the variables wherever possible. From the models 
we can then identify, which variables significantly effects the algorithmic performance. 

3.2 Multi-objective 0/1 Knapsack Problems 

To study the performance of different algorithmic configurations the multi-objective 0/1 
knapsack problem is used, which has been subject to recent empirical case studies, both 
in the evolutionary computation community and in the field of multiple criteria 

decision analysis [I3|. 

The multi-objective 0/1 knapsack problem is a function of the form k : {0, 1}" 
R"*, where n is the number of decision variables and m the number of objectives. In 
this work we refer to the definition in m and use the same parameters for the weights 

' We prefer this over doing replications with identical settings for it leads to a better distribution 
of samples in the design space of the algorithm. 

^ In order for a linear regression to be viable, the random errors are assumed to be independent and 
identically distributed by a normal distribution with zero mean and equal variance. In this study 
we verified these assumptions using graphical diagnosis tools for the residuals Ri = Vi — Vi, 
like Normal plot and Tukey-Anscombe plot. Generally, an appropriate transformation of the 
target variable V led to the desired results. 
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and profits of the items and the same constraint handling technique. Here we restrict 
ourselves to the hi-ohjective case as a baseline for further comparison, an extension of 
the experiments to m > 2 is straightforward. The number of decision variables and 
the allowed number of objective function calls is depicted in Table [I] Each individual 
represents one decision alternative as a bit-sting of length n. 

3.3 Performance Measure 

For the quality or performance measure V we apply a volume based approach according 
to ll^ with slight modihcations. Here, a reference volume between the origin and an 
utopia point - dehned by the profit sums of all items in each objective - is taken into 
account. The aim is to minimize the fraction of that space, which is not dominated by any 
of the final archive members. We consider this as the most appropriate scalar indicator 
since it combines both the distance of solutions (towards some utopian trade-off surface) 
and the spread of solutions. 

4 Elitism 

This section focuses on the influence of elitism and its possible interdependence with 
the mutation intensity. To isolate these effects, no truncation of the archive (a = oo) and 
no density dependent selection is applied, i.e. the evalutate operator assigns sampling 
rates only on the basis of the dominance level. 

Here, elitism in the sense of is assumed, i. e. the best individuals are not only 
stored permanently, they also take part in the selection of offspring. Our previous study 
indicated that there may be interaction between the mutation strength and the benefit 
of elitism. As in the original version of UMMEA, the elitism intensity is characterized 
by a parameter pe G [0, 1] which basically represents the probability to select a parent 
individual from the archive instead of the previous offspring population. This concept 
allows us to emulate the different elitist multi-objective evolutionary algorithms which 
have been proposed so far. 

4.1 Experiments 

In the first experiment, we vary randomly and uniformly in [0 , 1] and the mutation rate 
between 1 jn and 10 /n. The regression analysis shows signihcant interactions between 
the elitism intensity pe and the normalized mutation rate cr = Umut ■ n, where amut 
denotes the mutation probability for each of the n bits, a now gives the expected number 
of changed bits per mutation of an individual. The interactions are clearly visible in 
Fig. El Eor low mutation rates cr < 2, the performance decreases monotonously with 
Pe, while for cr > 5 the performance increases with pg. The region around the critical 
mutation rate cr « 3 has the lowest variance in performance. However, the best results 



Table 1. Different instances of the knapsack problem and default parameters for this study. 
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Maximal objective function calls 
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Fig. 3. Interaction plots for the combined influence of the mutation rate a and the elitism intensity 
Pe on V for the knapsack problem with n = 750. The median for each combination is shown. 



are achieved on either sides: There is a local optimum at tr « 1 and pe G [0> 0-1] and 
a global optimum around cr ~ 6 and — 1, while their difference seems to increase 
with n (see Fig.0. These results show that elitism is not always benehcial, it can even 
be harmful, if the mutation rate is too low, which is surprisingly the case for the popular 
rule amut = 1/tt- Nevertheless, the best results are achievable forpe = 1. 

At this point we refer to two other studies, which compared different algorithms on 
the same test problems. In |[2()||. a constant mutation rate of amut = 0.01 was used for 
all algorithms, and the advantage of the elitist SPEA (Strength Pareto EA, i2(|) became 
more visible with increasing problem size. This may be explained as follows: A mutation 
rate of 0.01 leads for n = 250 to ct = 2.5, which is still not high enough to make the 
influence of elitism strong. Eor n = 500 and n = 750, however, the values are cr = 5 
and a = 7.5, respectively, and thereby in a range where elitism is very important. In [8] 
the performance of SPEA was improved by increasing the archive size relative to the 
population and (due to the specific selection method used) implicitly the elitism intensity 
from 0.36 to 0.96. According to our hndings this is crucial especially when a mutation 
rate of 0.01 is used on the knapsack problem with n = 750. 

These observations support the assumption that non-elitist EA only work well, if the 
mutation strength lies under a certain threshold Otherwise selection is no longer 
able to compensate the low success rate of mutation to create dominating individuals, 
which usually decreases as the mutation rate increases. For multi-objective problems, 
where the success rate also decreases with the number of objectives, this can pose a 
further problem: For high dimensional objective spaces, no mutation rate may exist 
for which non-elitist algorithms lead to positive progress, unless the population size is 
very large and a high selection pressure is used. On the other hand, elitist EAs have 
a significantly higher optimal mutation rate, which in turn increases with the elitism 
intensity. Here, the archived solutions preclude divergence. 

4.2 Key Results 

The main observation is that the usefulness of elitism strongly depends on the mutation 
strength: 
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Fig. 4. Estimated response surface of the performance measure V to the normalized mutation rate 
(7 and the elitism intensity pe for n — 750. Two local minima arise at (pe = 1, cr « 6) and 
(pe = 0, O' = 1). A third order orthogonal polynomial was used as the model, the coefficients 
were estimated via linear regression. 



- If the usually recommended mutation rate cr = 1 is taken, the best performance is 
achieved without elitism (pe=0). 

- The best overall performance can be reached with strong elitism (pe > 0.7) in 
combination with high mutation rates (cr 5). 

- The combination of strong elitism with low mutation rates or weak elitism with high 
mutation rates shows the worst performance. 



5 Bounding the Archive Size 

In the previous section the influence of the mutation rate and the elitism intensity has 
been explored under the assumption that the archive is unbounded and can contain all 
non-dominated individuals found so far. In some cases, however, it may be desirable - 
or even mandatory - to limit its size for several reasons: 

- The size of the true non-dominated set of the multi-objective problem may be ex- 
ponentially large or even inhnite. 

- All implementations are restricted to limited resources (i. e. storage space). 

- The complexity of the archive updating operator increases with the archive size. 

- Genetic drift can occur since over-represented regions of the search space are favored 
in the (uniform) sampling process. 

While the first three points mean that one has to limit the archive size for practical 
considerations though it would ideally be unlimited, the last point indicates that a (useful) 
truncation of the archive may also lead to a performance gain. In the following we set 
the maximum archive size a := b and examine how different archive truncation methods 
affect the algorithmic performance. 
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5.1 Truncation Operators 

Rudolph and Agapie [Q provide theoretical results about convergence properties of 
different archiving and selection strategies for multi-objective optimizers in finite search 
spaces. The authors state that algorithms with unlimited archive sizes do have the desired 
convergence properties provided that the variation operators match certain preconditions. 
However, they are usually not of practical relevance because of limited resources. Instead, 
they propose an updating operator that respects a maximum archive size via a strong elite 
preserving strategy, thus keeping the desired convergence properties. In our experiment 
this operator is named truncatei, and it assures that for each dominated former archive 
member, at least one dominating individual must be included in the new archive. In 
contrast, the operator truncate 2 just makes a random choice which individuals to delete. 

Another possibility to reduce the archive size is clustering: The possible members of 
the archive are grouped into distinct clusters based on some similarity measure. Then, 
each cluster will be represented in the archive by certain individuals. Clustering-based 
approaches are not strongly elite preserving, and they can be very time consuming. 
Many algorithms are based on iterative melioration of a given partitioning according to 
a predefined necessary optimality condition and can therefore lead to partitions which 
are only locally optimal. Clustering-based archive reduction is used in SPEA ir2()H : The 
individuals are clustered by the average linkage method into a distinct clusters, from 
which the centroid individual is included in the new archive. Here, we implement this 
as the truncate^ operator. 

Other approaches to limit the archive size can roughly be categorized as density 
dependent ranking techniques. We will discuss density estimation later in the context 
of biasing selection. The idea for this concept is quite intuitive: Though the archive 
must be truncated, one would like it to be as ’diverse’ as possible. Hence, individuals 
in a densely populated area receive lower values and are discarded from the archive in 
favor of others. Different implementations of this concept are applied in m, Q, [SI, 
0, or Ol. In this study we represent the method used in NSGA-II 0 by the operator 
truncate 4 ^: For each objective coordinate the absolute difference of its predecessor and 
successor is aggregated for each individual, higher total values lead to better ranks. 

Table |2 gives an overview of these different techniques and the implementations 
we use in our experiments. As a baseline, the truncate^ operator is included, which 
represents the unlimited archive. 

5.2 Experiments 

At hrst two experiments were carried out where the normalized mutation rate was hxed 
at tr = 1 and cr = 4, respectively, while for each trial was again varied randomly in 
[0, 1] and one of the four truncate operators was picked with equal probability. 

For cr = 4 the truncate operator had no significant effect on the results at all. This 
is not surprising since in these cases the archive size rarely exceeds a. For the small 
mutation rate cr = 1, where we had very large archives in the experiments before, the 
effect of the truncate operator becomes significant. Fig.0(left) shows the box plots for 
the different operators. However, there is a strong interaction with the elitism intensity: 
The effect of the truncate operator increases with pe- Once again, if the archive is not 
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Table 2. Archive truncation methods in multi-objective evolutionary algorithms and operator 
instances for this study. 



Method 


No Reduction 


Conservative 


Random 


Clustering 


Density-based 


Operator 


truncated 


truncatei 


truncate 2 


truncates 


truncatei 


Examples 


vv unj 

PR ca 


AR-1 O 
AR-2 m 




SPEA 021 


PAES O 
M-PAES m 
PESAia 

NSGA-II 0 


Features 


may grow 
very large, 
genetic drift 


efficiency 

preserving, 

unreachable 

points 


easy 

implementation, 
low complexity, 
genetic drift 


good 

discrimination, 
adaptive metrics, 
high complexity 


good 

discrimination, 

medium 

complexity 





Fig. 5. Left: Box plots forn = 500, a = 1 and different fruncafe operators. The boxes range from 
the lower to the upper quartile of the observations, the median is marked with a bold line, outliers 
with an asterisk. The notches show confidence intervals for the group medians, by not overlapping 
they indicate a difference in location on a rough 5% significance level. Right: Interaction plots 
for the combined influence of the truncate operator and the pe level on V. The median for each 
combination is shown. 

used in the reproduction of offspring (pe = 0), the effect of the truncate operator is the 
weakest, this can be visualized by the interaction plots in Fig. 0 (right). 

As a result we can claim that, if a reduction of the archive is necessary, is should 
be done carefully to minimize information loss. The random choice is always worst. 
In no case was the clustering approach significantly inferior to the unbounded archive, 
while it was the best for large pe and low cr = 1. This shows that a ’good’ reduction 
method is able to bias the search in favor of under-represented regions. Surprisingly, the 
density-based method does not reach significantly higher V values than the conservative 
method. The reason may be found in the specific technique, which has difficulties to 
deal with identical objective vectors, and not in this type of reduction method. 

Nonetheless, there are other methods which directly influence the sampling rate of 
individuals based on their density, these will be discussed in the next section. One of 
the questions will be whether density dependent selection itself will lead to a higher 
performance gain than archive reduction. 
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5.3 Key Results 

Depending on the choice of the truncation operator, archive reduction can either decrease 
or increase the performance: 

- The clustering-hased method leads to an increase of the average performance over the 
whole range of Pe and a values. In other words: Clustering reduces the algorithm’s 
sensitivity to suboptimal parameter settings without, however, a further improvement 
for the optimal settings. 

- Random truncation lead to a significantly worse performance than the conservative 
method, which in turn was not significantly worse than the unbounded archive. 



6 Density Dependent Selection 

In multi-objective optimization, a uniform distribution of efficient points may be desir- 
able in general. Unfortunately, the randomness in the evolutionary operators (genetic 
drift), the granularity of the search space or the topology of the objective function can 
make the archived solutions as well as the population itself exhibit a rather non-uniform 
distribution. A uniform sampling from the archive even reinforces these effects: Over- 
represented regions will be sampled by parents more often, and - given at least a certain 
locality of the variation operators - more offspring will be produced there. 

One way to tackle this problem indirectly can be through a reduction of the archive, 
as described in the last section. A more direct approach, however, would be to bias the 
selection process in favor of under-represented regions. This will then be independent 
of the actual archive size and is much more flexible. 

6.1 Density Estimation Methods 

The density of individuals in a set A can serve as an a posteriori estimate of the probabil- 
ity density for the creation of the individuals in this set. This probability distribution is 
usually implicitly defined by the stochastic process which governs the evolutionary algo- 
rithm. However, it can easily be estimated (using standard probability density estimation 
techniques) and then be used to bias the sampling rates accordingly. 

The relevance of density estimation in the context of (multi-objective) evolutionary 
algorithms has been put forward by 01 > where the authors noted that the standard fitness 
sharing concept is essentially the application of a kernel density estimator. In [5|, existing 
results from kernel density estimation were used to derive guidelines for the fitness 
sharing parameters. 

Many advanced multi-objective evolutionary algorithms use some form of density 
dependent selection. Furthermore, nearly all techniques can be expressed in terms of 
density estimation, a classification is given in Table 0 We will make use of this as a 
further step towards a common framework of evolutionary multi-objective optimizers, 
and present the relevant enhancement of the unified model. 

A straightforward density estimate is the histogram: The (multi-variate) space is di- 
vided into equally sized cuboids, and the number of points inside a cuboid is the estimate 
of the density for this subspace. Here we apply this technique in the evaluatei opera- 
tor. Kernel density estimation is represented in evaluate 2 by the niche count function 
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Table 3. Density estimation techniques in multi-objective evolutionary algorithms and operators 
used in this study. 



Method 


None 


Histogram 


Kernel 


Nearest Neighbor 


Operator 


evaluateo 


evaluate! 


evaluate 2 


evaluates 


Examples 




PAES 

M-PAES 

PESA 


all algorithms 
with 

fitness sharing 


NSGA-II 


Features: 
continuous 
smoothing control 




no 

bin width 


as kernel function 
window width 


no 

local density 




6Va.l LJ^t6 



evaluate 



Fig. 6. Box plots for n = 500, cr = 1 (left) and a — 4 (right) for different evaluate operators. 
For n = 250 and n = 750 the plots a similar. 

known from standard fitness sharing. The evaluate^ operator uses the technique from 
NSGA-II, where the distances to neighboring individuals is used to calculate a volume 
around each individual. The reciprocal value of this volume then serves as an estimate 
of the density, which is typical for nearest neighbor density estimates. 

For the calculation of the target sampling rates, the normalized density estimate for 
each solution is added to its basic cost. If the minimal basic cost difference of differently 
valued solutions is at least 1 (like in our case of the dominance level), it will be assured that 
lower valued individuals always receive lower sampling rates regardless of their density. 
Thus, this method is used to bias selection only between equally valued individuals. 
Finally, a rank based assignment of target sampling rates is performed as before. 

6.2 Experiments 

In order to first investigate the effect of the density estimation alone, the elite preserving 
truncatei operator is usec0and the maximal archive size is set to a := b. a G [1, 10] 
and pe G [0, 1] are again chosen randomly. The evaluateo operator is the reference case, 
where no density estimation is applied. 

In contrast to the previous section, the influence of the evaluate operator is now 
significant for the whole range of a G [1, 10]. Figure^ shows the box plots, again for 

^ We chose this as the common baseline since it did not show significant differences to the 
unlimited archive. 
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Fig. 7. Interaction plots for n = 500 for different evaluate operators. The upper row shows no 
density estimation (left) and the histogram method (right), the middle row the kernel density esti- 
mator (left) and the NSGA-II-based method (right), the lower row the clustering based truncation 
operator combined with the kernel density estimator (left) and the histogram method (right). 



the medium size problem with n = 500. Notice that for a = 1 the location differences 
of the V distribution between the first three operators are stronger than for ct = 4, while 
for cr = 4 the differences in the minimal V values are stronger. 

Fig. □ visualizes how the behavior of the different evaluate operators interacts 
with different pe and a settings. Obviously, the kernel density estimator as well as the 
histogram method lead to lower V values in all groups. The NSGA-11 based method, 
however, does not change performance at lower mutation rates and only improves a 
few groups with stronger mutation. The corresponding fronts show that the performance 



194 



M. Laumanns, E. Zitzler, and L. Thiele 



gain of density dependent selection is only due to a broadening of the current non- 
dominated set, while approximation does not come closer in the middle. Comparisons 
to the real efficient set show that the evolutionary algorithm still hnds solutions only 
in the central region of the real efficient set, even with the density dependent selection 
operators considered here. 

Finally, the different density estimation techniques are combined with the clustering 
based truncate operator. Though this does not improve much on the top performance, 
the lower row of Fig. [7| shows that the truncation method reduces the sensitivity to 
suboptimal parameter settings, especially for low mutation rates. 

6.3 Key Results 

The performance gain of density dependent selection strongly depends on the accuracy 
of the applied density estimator: 

- In general, the kernel density estimator as well as the histogram method improves 
the performance for all pe and mutation rate settings. 

- The combination of these techniques with the clustering-based archive truncation 
method leads to synergy effects in the sense that the algorithm becomes even less 
sensitive to suboptimal parameter settings than without density based selection. 

7 Conclusion and Outlook 

In this study we identified and analyzed the key elements of evolutionary multi-objective 
optimizers. From the results we can derive some design guidelines: 

- Strong elitism should be used to achieve best performance, but in connection with 
a high mutation rate. The right combination of elitism intensity and mutation rate is 
the decisive factor for the performance. 

- Density based selection can further improve the algorithmic performance by a 
broader distribution of solutions along the trade-off surface. Here, enough effort 
should be put in the density estimation technique since only a good estimate brings 
forth the desired results; fitness sharing as a simple kernel density estimator is a 
good choice. 

- A good archive reduction method like clustering should be incorporated to make the 
algorithm robust concerning suboptimal parameter settings. However, the trunca- 
tion operator has to be chosen carefully as inappropriate methods can decrease the 
performance. 

As for the methodology of comparing different algorithms it should be noted that 

- In comparisons of elitist against non-elitist algorithm not only the influence of the 
mutation rate must be considered but also the intensity of the elitism, which is often 
defined implicitly. 

- When new algorithms are developed, the effect of all new features should be exam- 
ined separately. 

At present, these results are only valid for the bi-objective knapsack problem, but we 
believe that some fundamental characteristics have been found. It is subject to ongoing 
research to verify this for other problem classes and higher objective space dimensions. It 
can be expected, however, that performance differences increase, when more objectives 
and more difficult problems are considered. 
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Abstract. In this paper we discuss some questions of applying evolu- 
tionary algorithms to multiobjective optimization problems with contin- 
uous variables. A main question of transforming evolutionary algorithms 
for scalar optimization into those for multiobjective optimization con- 
cerns the modification of the selection step. In an earlier article we have 
analyzed special properties of selection rules called efficiency preservation 
and negative efficiency preservation. 

Here, we discuss the use of these properties by applying an accordingly 
modified selection rule to some test problems. The number of efficient 
alternatives of a population for different test problems provides a better 
understanding of the change of data during the evolutionary process. 
Also effects of the number of objective functions are treated. We also 
analyze the influence of the number of objectives and the relevance of 
these results in the context of the 1/5 rule, a mutation control concept 
for scalar evolutionary algorithms which cannot easily be transformed 
into the multiobjective case. 

Keywords: multicriteria decision analysis, stochastic search, evolution- 
ary algorithms, selection mechanism, step sizes, 1/5 rule. 



1 Introduction 

Since a couple of years there is a growing interest in using evolutionary algo- 
rithms, e.g. genetic algorithms (GAs) (see [E], 0) or evolution strategies (see 
ID), for multiobjective optimization (see 0,0, and m for surveys). The 
main problem of transforming the algorithms from scalar optimization methods 
into multiobjective optimization tools lies in the modification of the selection 
procedure. In many approaches this modification is based on a more or less sim- 
ple aggregation of the several objective functions into one (scalarization). This 
procedure results in some difficulties, e.g. that such algorithms converge (if they 
converge) to one or several specific solutions while others cannot be obtained. 
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For these and other reasons, the main interest has focussed on selection 
strategies which use the Pareto order only. Earlier analyses have shown that ad- 
ditional requirements are needed to prove that such algorithms converge towards 
efficient solutions m, for other results see EH)- These are elitist and conser- 
vative features of selection which may have other disadvantages. Especially, a 
slower introduction of new generated data is a possible drawback. Below, we 
show some experimental results concerning this question. 

Another open problem concerns the adaptation of the parameters controlling 
the strength of mutations. In evolution strategies, the alternatives are encoded 
as floating point numbers instead of bit strings as in genetic algorithms. This al- 
lows a more appropriate representation of multiobjective optimization problems 
with continuous variable^ and a reaching of higher precision results. For a fast 
approaching of an optimal solution, the 1/5 rule has been proposed as an adap- 
tation procedure for the mutation strengths in scalar evolutionary algorithms. 
Below, we discuss why this rule is not applicable in the multiobjective case. 

2 Terminology 

Let us consider a multiobjective programming (MOP) problem defined as follows: 
{MOP) minaeAfio-) with f : i?'*, q>2, 

with A= {a G : gj{a) = 0, j G {1, ..., m}} yf 0 

/ is a vector-valued function consisting of components fk '■ A R which are 
called objectives or criteria. A is the set of feasible alternatives (or solutions) 
of the optimization problem. The functions gj : i?” ^ R are called restriction 
functions. Setting q = 1, the (MOP) becomes a scalar optimization problem. 

The most important mathematical solution concept for an MOP problem is 
called efficiency, or Pareto optimality (see, e.g., |7])- For its definition, a general- 
ization of the “<” relation for vectors x, y in the objective space is necessary: 

X < y iff Xk <yk for all A: G {1, ..., g} and x ^ y. 

We use “=” for the component-wise generalization of the scalar “<”: 

X = y iff Xk <yk for all k G {1, ..., g}. 

The set E{A, /) of efficient (or Pareto-optimal) solutions is defined as: 

E{A, /) := {a G A : -36 G A : /(6) < /(a)}. 
a G E{A,f) is called efficient (or Pareto-optimal). 

^ Although, evolution strategies have so far only been scarcely developed and applied 
for multiobjective programming (see Da, dhi, i). More recent approaches using 
floating number representations of decision variables were presented, for instance, at 
the PPSN V conference |31 and at the Workshop on Multiobjective Problem Solving 
from Nature (MPSN) DSI- 
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The set of efficient solutions usually consists of many alternatives. For deci- 
sion problems in practice it is often desired to select one alternative as ‘solution’ 
of the problem. For this purpose additional information related to the decision 
maker’s preferences is requested. The property of efficiency is usually regarded 
as a minimum requirement for any such ‘compromise solution’. For details on 
methods supporting a decision maker see, for instance, [T^ . pS). P7) . or 

Let us denote the parent population of a multiobjective evolutionary algo- 
rithm (MOEA) in generation t as M* = {aj,...,a^} with a* G A C i?” for 
I G The set of efficient alternatives of population M* is denoted as 

E{M*,f). The offspring population in t is N* = {b\,...,b\} with b\ G A for 
I G {1, ..., A}. 

For an alternative a G A we define the dominating set as 
Dom{a) := {a' G A : f{a') = f{a)}. 

For a population M* the dominating set is defined as 

Dom{M*) := Dom(al). 

For an alternative a G A the dominated set is defined as 

Dom~{a) := {o' G A : /(o) < /(o')}. 

The dominated set for a population M* is then defined as 
Dom~ (M^) := Dom~{a\). 

More details on dominating and dominated sets can be found in HD. 

3 Selection Mechanisms 

Compared with scalar optimization, the multiobjective nature of the MOP prob- 
lem causes difficulties for the selection step of an evolutionary algorithm while 
other steps like mutation or recombination of alternative values are not neces- 
sarily affected by the multiobjective nature of the alternative evaluations. 

In the scalar case, alternatives are judged by a single (real-valued) objective 
function which allows to define a linear order on the objective evaluations. With 
this, alternatives can be (completely) rank-ordered, a best alternative can be 
defined and so on. (Canonical) genetic algorithms then define probabilities of 
an alternative’s reproduction based on its relative fitness. In evolution strategies 
usually an elitist selection strategy is applied which chooses the fi best of the 
A children (comma strategy) or of the A children and fj, parents together (plus 
strategy) as parents for the next generation. 

Considering a multiobjective evaluation of alternatives, these and similar 
concepts cannot be applied since only a partial order (in objective space) or 
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Pareto order (in decision space) is naturally defined on the objective evaluations. 
This implies that there may be alternatives which are not comparable (better 
or worse) considering fitness. There is no obvious way to define a ranking order, 
probabilities for reproduction etc. for which fitness serves in scalar optimization. 
In the literature (see, e.g., m, and HD for surveys) different approaches 

to overcome this problem have been proposed. 

A quite simple approach is to define a scalarization function which maps 
the q objective functions to a single aggregated one such that a problem of type 
(SOP) can be analyzed. Approaches based on scalarization (see [El, for a discus- 
sion of theoretical aspects) are also often used in multicriteria decision making 
(MCDM), for instance in utility theory. Other MCDM scalarization concepts 
applicable in evolutionary algorithms are additive weighting and reference point 
approaches. Using such a scalarization function for the q objectives the familiar 
selection processes for (scalar) evolutionary algorithms can be used. 

Several problems may be involved with such scalarization approaches: For 
instance, they do not work well in approximating the efficient set, they possibly 
have difficulties generating all efficient solutions in the concave case (see 0) , or 
they do not allow a ‘truly’ multiobjective analysis etc. 

Some modifications of the scalarizing concept have been proposed to allow 
a generation of a diversified set of solutions approximating the efficient set. 
One approach uses the q different scalar objective functions. In each selection 
step one of these functions is chosen randomly (El, HD). Another approach 
K2I is based on a division of the population into q groups and an selection in 
each group k according to objective function k. Fourman |D proposes to use 
pairwise selection based on one objective k selected according to pre-selected 
priorities (probabilities) randomly. Also for such approaches similar problems 
with representing the efficient set as in the scalarization approach occur. 

Another, more consequent direction of transforming scalar evolutionary al- 
gorithms into multiobjective optimization methods consists in using only the 
Pareto order for selecting alternatives. Some of these approaches are based on 
pairwise comparisons, a kind of tournament selection m while others consider 
the alternative set of the population in total. For instance, an alternative is 
judged by the number of other alternatives which dominate it (see below). Sur- 
veys of selection mechanisms are, for instance, given by Fonseca and Fleming 
and Tamaki, Kita, and Kobayashi |2fi| . Since these methods are parsimonious 
with additional information and usually good in representing the efficient set, 
such approaches are mostly preferred today. 

Two of these selection mechanisms are very simple and have been im- 
plemented within LOOPS, the Learning Object-Oriented Problem Solver (see 
0.H2I) which integrates different MCDM methods, evolutionary algorithms and 
other decision making tools (like, e.g., neural networks). One of them is based 
on the judging of an alternative according to the number of alternatives (plus 1) 
by which it is dominated (called dominance rank or dominance grade) 0. The 
dominance grade works as an auxiliary fitness function. 



Selection Mechanisms and Mutation Control 201 

A similar concept is the dominance level which is based on an iterative def- 
inition of different layers of efficiency [H|. The first layer includes the efficient 
alternatives of the population. The second one is defined by those alternatives 
which are efficient when the first layer is removed, etc. The number of the layer 
of an alternative defines an auxiliary fitness function to be minimized. 

Although these approaches seem to overcome some of the problems caused 
by other selection mechanisms several possible disadvantages should be noted: 
The discrimination power of the dominance grade or dominance level as crite- 
ria for selection becomes low, especially in a progressed state of evolution or in 
high dimensional objective spaces. Below, there are some test results demon- 
strating such effects (for a modified dominance level approach). When at least 
^ alternatives of the M* U TV* population become efficient, they have the same 
dominance grade and dominance level such that a selection among them becomes 
arbitrary unless some additional selection rules are applied. Such selection rules 
are discussed in the next section. They have been introduced in HH to ensure 
convergence and other interesting properties. 

4 Efficiency Preservation and Negative Efficiency 
Preservation 

The properties of efficiency preservation and negative efficiency preservation have 
been developed to have a multicriteria analogue to monotonicity properties as 
fulfilled in some evolutionary algorithms as, for instance, (/i-|-A)-evolution strate- 
gies for scalar optimization which guarantee monotonously increasing best values 
of the population. Such properties are useful for proving the convergence of an 
algorithm. Using efficiency preservation, the convergence of multicriteria evolu- 
tionary algorithms is shown for ‘regular’ MOP problems in m- 

A MOEA is called effieieney preserving iff for alH > 0 and 0 ^ C A 

£>om(M‘+^) C Dom{M*) 

is fulfilled. It is called negatively efficiency preserving iff for all t > 0 and 0 ^ 
M° C A 

Dom ~ A Dom~ {M*) 

holds. 

Simply said, efficiency preservation is based on the idea of reducing the dom- 
inating set during the evolution while negative efficiency preservation increases 
the dominated set. In El, we have shown that these properties are mutually 
independent. 

Although efficiency preservation implies interesting convergence properties 
for MOEAs, this property easily brings up problems for a good representation 
of the efficient set because alternatives in E{A, f)\Dom{M^) are unreachable. 

For negative efficiency preservation, an equivalent formulation has been found 
m which is more appropriate for algorithmic application: 



a* e E{M*,f) ^ 3a*+^ G : /(a*+^) = /(a*) for allt > 0. 
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This means that an efficient parent a* is either transferred into the next genera- 
tion or replaced by an offspring alternative € iV‘ such that dominates 
a‘ (or /(a*+i) = /(a*)). 

Such a ‘conservative’ selection property can be regarded as a multiobjective 
generalization of the ‘plus’ concept {{iJ, + A) evolution strategies) with elitist 
selection where parents survive as long as they are better than their offspring as 
discussed for scalar evolutionary algorithms. 

The dominance grade and dominance level methods are modified as follows 
to provide negative efficiency preservation: First, dominance grades and domi- 
nance levels are calculated as usual. Then, for all parent alternatives this value 
is decreased by, for instance, 0.5. Because of this, parent alternatives which are 
efficient in are preferred to efficient offspring. Offspring replaces parents 

which are efficient in M* only when they dominate them. To provide a maxi- 
mum change of data while fulfilling negative efficiency preservation, we prefer 
inefficient offspring to inefficient parents with the same grade or level. 

Let us finally remark that also many other multicriteria selection methods 
for evolutionary algorithms as discussed in the literature (see above) are neither 
efficiency preserving nor negatively efficiency preserving. 



5 Some Test Results 



In this Scetion, we show some experimental results with applying a selection 
concept as outlined above. Especially, we focus on the speed of obtaining a 
situation of a population consisting of efficient alternatives (with respect to the 
population) only for test problems of different ‘complexity’ and for different 
population sizes. 

For the application runs of the MOEA the following parameters have been 
chosen: The mutation strengths are initialized with ai = 0.1 -x^ where sP is the 
starting point with > 0 for i G {1, ...,n}. Since the 1/5 rule is not applicable 
for MOEAs (see Section 6) we have chosen a concept of mutating the mutation 
strengths which is done with a = This means that mutation rates are 

changed by 10% on the average per generation. Recombination is performed be- 
tween two randomly chosen alternatives with probability 0.1. The recombination 
is non-intermediary. Fitness sharing techniques are not applied. 

To ensure negative efficiency preservation let us summarize the following 
necessary adaptations and parameter assessments: 

1. The selection scheme is based on Pareto information only, i.e. dominance 
grade or dominance level. (Because both methods produce very similar re- 
sults, only those for the dominance level approach are shown below.) 

2. This scheme has to be modified (see above). 

3. Parents and offspring are both considered during the selection (as in plus 
evolution strategies). 

4. An elitist selection according to the evolution strategy concept is applied. 
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This is necessary because technically the results of the multicriteria fitness 
evaluation is are scalar values, e.g. the number of alternatives (plus 1) by which 
the considered alternative is dominated. Therefore, it is essential to consider the 
/i best alternatives as parents of the next generation only (unlike in standard 
GAs where fitness determinates selection probabilities). 

We consider the following sizes of the parent generations: ^ = 10, /r = 50, 
and /i = 250. For the number of offspring. A, always A = /r is chosen. The number 
of simulated generations, #gen, is adapted to the population size because for 
a small population, a situation where the number efficient alternative approxi- 
mately equals fi is reached faster than for larger population sizes. For fi = 10, 
we choose ^gen = 100, for g, = 50, ^gen = 200, and for g = 250, #gen = 1000. 

Let us consider the following three MOP problems with n = q = 2 already 
analyzed using a MOEA in mg: The first one is a convex MOP with quadratic 
objective functions: 

(PI) minf{x),f : > P^, fi : x for i G {1, 2} 

with 

X G X = {x G ■. xi > 0,X2 > 0,xi + X 2 > 5}. 

The second problem captures the problem of a disconnected efficient set 
caused by an appropriate restriction function with multiple local optima: 

(P2) minf{x),f : R^ P^with/i : x Xi for i G {1,2} 

and 



X G X = {x G R^ '■ xi > 0,X2 > 0,X2 — 5 + 0.5 x± sin{4:Xi) > 0}. 

For this problem the efficient set consists of 7 separated regions. 

The third problem is even more complex because of locally (but not globally) 
efficient regions which are caused by non-monotonous objective functions (which 
are additionally not continuous): 

(P3) minf{x),f : R^ ^ R^ with 

fi'.xi-^ int(xi) + 0.5 -I- (xi — int{xi))sin{2Tr{x2 — int{x 2 ))), 
f2'-x^ int{x2) + 0.5 -I- (a;i — int{xi))cos{2TT{x2 — int{x2))), 

and 

x G X = [x G R? \ Xi>Q,X 2 >Q,Xi + X 2 > 5}. 

This problem is based on a partition of the decision space into squares [j, i-l- 1) x 
b)J + 1))L j G N which are mapped onto circles with radius 1 using a ‘rectan- 
gle into polar’-type of co-ordinate transformation. Because of the int operator 
(which calculates the integer part of its argument) the objective functions are 
neither continuous nor monotonous. Multiple locally efficient regions exist which 
are not (globally) efficient. In mili graphical representations for these problems 
and additional information are provided. 
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In contrast to m, these problems are now solved using modified the domi- 
nance level method which ensure negative efficiency preservation. Results of the 
(modified) dominance grade model are not shown because they are very similar, 
i.e. selection results for efficient alternatives do not differ. 

Instead, we are interested in the number of efficient solutions during the 
simulation which mainly influences the possibility of getting new data into the 
population. Below there are 3 figures showing the number of efficient alterna- 
tives in M* U for each generations number during the experiments 0 In each 
graphic we have the results for the 3 different MOP problems given for a different 
population size. 



# eff. points 




0 10 20 30 40 50 60 70 80 90 100 generation no. 

Fig. 1. Results of a MOEA for = X = 10. 



In all cases we observe an increase of this number until it is approximately 
equal to /r. For larger populations this process obviously takes longer than for 
small populations. Except for the small population size with fi — 10 where such 
an effect is not clearly visible, the different complexities of the problems influence 
the time to come to this type of balance: For problem (PI) /r efficient solutions 
are reached faster than for (P2), and for (P2) faster than for (P3). In these 
situations it is increasingly difficult for offspring alternatives to become parents 
of the next generation. 



^ In the Figures and Table 1, the results for an exemplary test run are shown. Other 
test runs with the same parameters of the evolutionary algorithm led to similar 
results. 
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# eff. points 




0 20 40 60 80 too 120 140 160 180 200 generation no. 

Fig. 2. Results of a MOEA for /r = A = 50. 

# eff. points 




0 100 200 300 400 500 600 700 800 900 1000 generation no. 



Fig. 3. Results of a MOEA for jj, = \ — 250. 



Table 1. Number of generations until reaching fi efficient alternatives in M*' U N^. 



Problem 


/i = 10 


O 

II 


/i = 250 


(PI) 


19 


59 


127 


(P2) 


23 


142 


601 


(P3) 


13 


159 


663 
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6 The Non-applicability of the 1/5 Rule 

For scalar evolution strategies the 1/5 rule has been introduced by Rechenberg 
prU (and further specified by Schwefel P- 128-132, |23) as a means for con- 
trolling the mutation strengths. Without an adaptation of mutation strengths, 
it is rather difficult to approach the optimum when the alternatives are already 
close to it (in relationship to the values). On the other hand, for small 
values it takes a long time to proceed to the optimum. Therefore, it is useful to 
start the optimization process with relatively large Ci values and to reduce them 
while coming close to the optimum. 

The 1/5 rule is based on theoretical considerations |2n| which show that it 
is useful to have approximately 1/5 of the offspring alternatives being better 
than their parents. Too low values of ai lead to a higher percentage of successful 
offspring, too high values lead to a lower percentage. According to the 1/5 rule, 
the step sizes ai are increased if on average the portion of ‘successful’ offspring 
is larger than 1/5. If the portion is less than 1/5 the step sizes are decreased. 
Direction-specific adaptations of the at (i.e. those which change the Oij ratios) 
are not supported by the 1/5 rule. Therefore, it is only possible to prescribe 
constant scaling factors for the co-ordinate directions because the Oi remain 
in constant proportions (as long as they do not reach a minimum value > 0). 
From scalar optimization, it is known that the 1/5 rule fails when there are no 
continuous first partial derivatives of the objective function (E2|, P- 136f, |24j 1 . 
e.g. because of active restrictions. 

For multiobjective applications additional difficulties arise for this kind of 
step size control. Considering multiple objectives, it is no longer possible to 
decide easily whether a new alternative is ‘better’ than an existing one. If we do 
not apply scalarizing functions or similar tools based on additional information 
we can only compare alternatives using the Pareto order. In objective space B3 
an alternative aj is better than an alternative Oi if aj is in Ui — (i?^\0). aj is 
worse than at if aj is in Ui + (i?^\0). Otherwise, if ai and aj are not identical, 
then ai and aj are incomparable. 

If aj is uniformly distributed in a symmetric interval around Oj, or if aj is 
normally distributed with expected value then the probability of aj being 
better than ai is l/(2q). The probability of ai being better than aj is the same. 
The probability of ai and aj being incomparable is 1 — ( 1 /( 29 ) -|- 1 /( 29 )) = l~l/9- 
The situation of incomparability therefore increases with the dimensionality of 
the objective space. 

In |2n|, the deduction of the 1/5 rule for scalar evolution strategies is based 
on an analysis of two optimization problems considered as representative for 
various other problems: the ball model and the corridor model. The ball model 
is an unrestricted optimization problem where the Euclidean distance to a given 
point xi is minimized. This model is supposed to represent the situation of 
coming close to an optimal solution. In the corridor model the objective value of 
a point increases linearly along a tube of feasible points. Here, the main problem 
is to leave not the corridor. This model is assumed to be representative for 
optimization far away from an optimum. For both models, similar optimal values 
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for the progress speed (average improvement in alternative space, or covered 
distance in the optimum direction, per generation) are obtained for a success 
probability (average number of offspring with better objective values than the 
parents) of 1/5. 

The corridor model cannot easily be transformed into a multiobjective model. 
If we keep a one dimensional tube of feasible alternatives and linear objective 
functions we can come to very different situations in objective space. Let be 
the present point of a population of a MOEA with /i = 1. Either almost all of 
the image of the ‘positive half’ of the corridor is enclosed in f{x^) minus the 
Pareto cone (as in the scalar case) (unbounded problem) or only a small 
compact part of its image lies in the Pareto cone. The first case is very similar 
to the scalar model while in the second case almost all of the corridor’s image 
in objective space is not enclosed in the Pareto cone. The points can march 
only towards a (compact) part of the cone’s intersection with the feasible set 
(in objective space) . This problem is rather related to close approximation than 
avoiding infeasibility. Therefore, we are discussing the ball model only. 

The ball model for the multicriteria case can be defined as follows: Let us 
consider q goal vectors , ..., x‘^ G i?”. The objective functions fk, k G {1, ..., q}, 
are given by 



n 

minfk{x) = '^{x, - x^)"^ for kG{l,...,q}. 

i=l 



The feasible set is A = i?". 

Let us consider an arbitrary point G i?” as parent. The probability density 
to obtain an offspring point x G i?" from x^ is then 

p{x^ —>■ x) = (l/-\/^(T)"e~^^^^ 

The expected improvement in alternative space (progress speed) with respect 
to objective k, cj)k, is given by 




Unfortunately, this integral is not analytically solvable. Additionally, it is not 
possible to use approximation functions as in m for the scalar case. Especially, 
one cannot simplify the n-fold integral by using a co-ordinate transformation 
(into ball co-ordinates). Therefore, also numerical integration using standard 
mathematical software (as, for instance, MAPLE V) fails since this software 
requires an integration area given by an interval. Here, the integration area is 
the intersection set of q n-balls around x^,...,x‘^ with x^ on the border of each 
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of these balls. This set is the dominating set for the one alternative population 
{a:o}- 

Let us consider the number of offspring alternatives being in Dom{M*) for 
the quadratic optimization problem (PI). This problem can easily be generalized 
for an arbitrary number of objectives q. In Fig. 4, we show the test results for 
dimensions q = 1,2,3,5,10. The population consists of /r = 100 parents and 
A = 100 offspring alternatives. The simulation is performed for 200 generations. 
All other parameters are as above. 




Fig. 4. Number of offspring alternatives dominating efficient parents. 



In general, we see that the number of dominating offspring alternatives de- 
creases during evolution. In some cases, i.e. for g = 3 and q = 5 the number 
reaches a maximum after a few generations. During evolution this number de- 
creases in all cases to an average value of about 1 % or less. This is also true for 
the scalar case which cannot reach the value according to the 1/5 rule (i.e. 20 
%) for /X = A = 1. The highest values on average are obtained for q = 5. 

Let us also consider the number of offspring alternatives which are not dom- 
inated by any parent, i.e. those which are not in Dom~ {M*). In Fig. 5, the 
values are represented for the same test runs. Since a G Dom{M*) implies 
a ^ Dom~ this number is always greater or equal than the number of 
dominating offspring alternatives. For the scalar case, these numbers are equal 
because Dom~ (M*) = A\Dom{M^). 

We see that the number of nondominated offspring alternatives increases 
with the number of objectives. Similar results have been achieved by Calpine 
and Golding 0 already in 1976. For q = 2 the percentage of nondominated 
alternatives is about 2-3, for g = 3 it is 10-20, for g = 5 it is 40-80 and for g = 10 
almost always 100. Convergence effects during the evolution process are small. 
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q = 1 

q = 2 

q = 3 

q = 5 

q = 10 ■ 



generation no. 



Fig. 5. Number of offspring alternatives nondominated by any parents. 



Let us finally consider the number of efficient alternatives in the whole pop- 
ulation U N* (Fig. 6). For q = 2 and q = 3 this number reaches a constant 
level of around fj, alternatives after 25 and approximately 70 generations. For 
higher q the number increases further because of many efficient offspring. For 
q = 1, there was always only one efficient (=optimal) alternative. 




Fig. 6. Number of efficient alternatives in population M*' U N*. 
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7 Conclusions 

Our test results have shown some fundamental problems with applying ‘conser- 
vative’ selection mechanisms in multiobjective evolutionary algorithms. A pop- 
ulation quickly reaches a situation where almost all parents are efficient in M*. 
Applying efficiency preservation, such a parent can be replaced by an offspring 
alternative only when it is better considering the Pareto order which occurs more 
and more scarcely with growing q. 

When negative efficiency preservation is used also offspring which is not com- 
parable with efficient parents can become new parents. Such alternatives, how- 
ever, cannot replace parents which are still efficient in M* U N* . When the par- 
ent population reaches the ‘all efficient situation’ negative efficiency preservation 
does not offer much advantages for offspring alternatives to survive compared 
with efficiency preservation. Therefore, both concepts may be rather unfavorable 
for quick progress. 

Further test runs for other optimization problems should be done. Especially, 
it would be useful to compare different selection rules concerning the representa- 
tion of the efficient set and the speed of computation (in relation to approxima- 
tion quality). Such test runs should also help to find more heuristic information 
on ‘good’ mutation strengths and their control. 
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Abstract. The performance of stochastic optimisers can be assessed 
experimentally on given problems by performing multiple optimisation 
runs, and analysing the results. Since an optimiser may be viewed as an 
estimator for the (Pareto) minimum of a (vector) function, stochastic 
optimiser performance is discussed in the light of the criteria applicable 
to more usual statistical estimators. Multiobjective optimisers are shown 
to deviate considerably from standard point estimators, and to require 
special statistical methodology. The attainment function is formulated, 
and related results from random closed-set theory are presented, which 
cast the attainment function as a mean-like measure for the outcomes of 
multiobjective optimisers. Finally, a covariance-measure is defined, which 
should bring additional insight into the stochastic behaviour of multiob- 
jective optimisers. Computational issues and directions for further work 
are discussed at the end of the paper. 



1 Introduction 

Stochastic optimisers, such as evolutionary algorithms, simulated annealing and 
tabu search, have found many successful applications in a broad range of scientific 
domains. However, only limited theoretical results concerning their performance 
are available. Typically, simple versions of the algorithms and/or objective func- 
tions must be considered in order to make the theoretical analysis possible, which 
limits their practical applicability. As an alternative, the performance of stochas- 
tic optimisers may be assessed experimentally on given problems by performing 
multiple, independent optimisation runs, and statistically analysing the results. 

Two main issues are raised by such an inferential approach. Firstly, the very 
notion of optimiser performance must take into account the stochastic nature of 
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the optimisers considered, as well as any other relevant optimiser characteristics, 
such as scale-independence, for example. As the same considerations apply to 
statistical estimators, optimiser performance will be discussed in that light in 
Section 0 

Secondly, specific statistical methodology may be needed, depending on the 
notion of performance adopted, in order to analyse the data produced by the op- 
timisation runs. In particular, multiobjective optimisers such as multiobjective 
genetic algorithms ([Fonseca and Fleming, 1995t produce sets of non-dominated 
objective vectors, instead of a single optimal objective value per run. Dealing 
with random sets introduces additional difficulties into the analysis. In SectionOl 
the attainment function is formally defined, and shown to relate closely to es- 
tablished results in random closed set theory. In particular, it is shown to be 
a measure analogous to the common mean, which considerably strengthens its 
role as a measure of multiobjective optimiser performance. Based on the same 
theory, variance-like and covariance-like measures are introduced which should 
provide additional insight into multiobjective optimiser performance. 

Finally, computational issues are discussed. The paper concludes with a sum- 
mary of the results, and a discussion of their implications for future work. 



2 Inferential Performance Assessment 

Optimiser performance can ultimately be understood in terms of the trade- 
off between the quality of the solutions produced and the computational effort 
required to produce those solutions, for a given class of optimisation problems. 
Experimentally, optimiser performance may be assessed in terms of: 

1. The time taken to produce a solution with a given level of quality (run time), 

2. The quality of the solutions produced within a given time, 

where time may be measured in terms of number of iterations, number of function 
evaluations, CPU time, elapsed time, etc., and solution quality is defined by 
the problem’s objective function(s). When considering stochastic optimisers, or 
deterministic optimisers under random initial conditions, both run time, in the 
first case, and solution quality, in the second case, are random, and the study of 
optimiser performance is reduced to the study of the corresponding distributions. 

Hoos and Stiitzle (11998^ propose the estimation and analysis of run-time 
distributions. It is worth noting that such time-to-event data may originate from 
improper distributions, since an optimiser may fail to find a solution with the 
desired quality in some runs. Also, the data may be subject to censoring whenever 
the actual run-time of the optimiser exceeds the practical time-limits of the 
experiment. Thus, the data may require special statistical treatment, of the kind 
usually encountered in statistical survival analysis. Run-time distributions are 
univariate distributions by definition, even if the problem considered involves 
multiple objectives. 

Fonseca and Fleming 11199911 suggested the study of solution-quality distribu- 
tions. The outcome of a multiobjective optimisation run was considered to be the 
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set of non-dominated objective vectors evaluated during that run. In the single- 
objective case, this reduces to a single objective value per run, corresponding to 
the quality of the best solution(s) found, and leads to the study of univariate 
distributions. In the multiple objective case, however, solution-quality distribu- 
tions are either multivariate distributions, in the case where optimisers produce 
a single non-dominated vector per run, or set distributions, in the general case. 

In this context, optimisers may be seen as estimators for the global (Pareto) 
optimum of a (vector) function. Therefore, optimiser performance can be viewed 
in the light of the performance criteria usually considered for classical statistical 
estimators. However, it must be noted that optimisers are actually more than 
simple estimators, as they must also provide the actual solutions corresponding 
to their estimates of the function’s optimum. 

2.1 The Single-Objective Case 

As discussed above, the outcomes of single-objective optimisers consist of a single 
value per optimisation run, which is the objective value corresponding to the 
best solution(s) found. Therefore, one is interested in the stochastic behaviour 
of random variables X in R, and the performance of optimisers and that of point 
estimators may be seen in parallel. 

Good estimators should produce estimates which are close to the unknown 
estimand, both in terms of location and spread. The same applies to the out- 
comes of single-objective optimisers. Closeness in terms of location may be mea- 
sured by the difference between the mean or the median of the corresponding 
distributions and the unknown estimand. This is known as the mean-bias and 
the median-bias, respectively. Ideally, both should be zero. Possible measures 
of spread are the variance and the interquartile-range, both of which should 
be small. Alternatively, location and spread may be combined in terms of the 
mean-squared-error, which should also be small. 

Mean and variance are the first moment and the second centred moment of 
a distribution. They are efficiently estimated by the arithmetic mean X and the 
empirical variance s^, respectively, when the underlying distribution is close to 
normal. This is the case with many statistical estimators, at least for sufficiently 
large sample sizes. The solution-quality distributions of optimisers, on the other 
hand, can (and should) be very asymmetric. Moreover, objective-scale infor- 
mation is ignored by some optimisers, which rely solely on order information. 
Thus, estimating the median and the inter-quartile range through their empiri- 
cal counterparts might be preferred here, since quantiles are scale-invariant, i.e. 
r[ 7 (A)] = 7 [r(A)] for any quantile 7 and any strictly monotonic transformation 
r {Witting, 1985j p. 23). 

In addition to closeness considerations, point estimates and optimisation out- 
comes should follow a type of distribution easy to deal with. In the case of estima- 
tors, this is usually the normal distribution. Optimisation outcomes, however, 
must follow a distribution which is bounded below (considering minimisation 
problems). Its left end-point should be as close to the unknown minimum as 
possible, and it should be right skewed, so that outcomes are likely to be close 
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to the minimum. Given that the outcome of a single-objective optimisation run 
is the minimum of all objective values computed in the course of the run, ideal 
solution-quality distributions would be extreme-value distributions, the estima- 
tion of which has been vastly studied in the literature (see for instance Smith 
dEHZ), Lockhart and Stephens (Cnni, and Embrechts et al. (ITWIl 'l. both in a 
parametric and in a semi/non-parametric setting. 

The shape of a distribution can be assessed directly by estimating the cumula- 
tive distribution function, Fx{-), which completely characterises the underlying 
distribution. One may also wish to study specific aspects of the distribution, such 
as skewness (e.g. through the kurtosis) and tail behaviour (through end-point 
and tail-index estimation, for example). For minimisation problems, left and 
right-tail behaviour is related to best and worst-case performance, respectively. 

2.2 The Multiobjective Case 

When the optimisation problem is multiobjective, a whole front of Pareto- 
optimal solutions in is to be approximated, and the outcome of an opti- 
misation run may be a set of non-dominated objective vectors. For simplicity, 
the situation where the outcome of a run consists of a single objective vector 
shall be considered first. 

Single objective vectors. The most common multivariate measure of location 
is possibly the arithmetic mean, which is now a vector in If the unknown 
estimand is also a vector, as is the case with multivariate point estimators, this 
is clearly appropriate. The mean-bias of a point estimator, for example, can be 
written as the difference between the mean of corresponding distribution and the 
unknown estimand. Common measures of spread are the covariance matrix and 
other measures related to it (|Mood et al., 19741 p. 3511T). All formulate spread 
in terms of deviation from the mean, which is a point. 

In a multiobjective optimisation context, however, both bias and spread 
should be understood in terms of Pareto fronts. Note that the mean-vector of 
a number of non-dominated vectors could be located beyond a concave Pareto- 
optimal front to be approximated, outside the collection of all possible outcomes! 
Useful, alternative measures of bias and spread shall be given later in Section El 

The shape of a multivariate distribution can be assessed through estima- 
tion of the cumulative multivariate distribution function, even though this is 
more challenging computationally than the corresponding univariate case. Again, 
solution-quality distributions should be skewed in the sense that outcomes should 
be likely to be close to the unknown Pareto front. Note that the Pareto front 
imposes a bound on the support of solution-quality distributions. Multivariate 
extreme- value theory is currently an active, but very specialised, area of research. 

Multiple non-dominated objective vectors. Outcomes are represented by 
the random (point) sets X = {Xj G j = 1, M} where the elements Xj 
are non-dominated within the set and random, and the number M of elements is 
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random. Performance assessment oriented towards solution quality, as discussed 
so far, must take into account the particular set-character of the distributions 
involved. 

Statistical estimators which produce a set of non-dominated vectors in 
when applied to a data-set are not known to the authors, but curve estimators, 
seen as (continuous) random curve sets in could be related. Bias measures 
for curve estimators g{-), such as the average sum of squares 

^-^[giZ,) - g{Z,)r 
2=1 



or the supremum-norm 



sup \g{Z,) - g{Zi)\, 

2 = 1 ,... ,k 

where the G K are either random or deterministic, might suggest suitable 
analogues for the performance assessment of multiobjective optimisers. If the 
difference is replaced by the minimum Euclidean-distance between the random- 
outcomes Xj and the Pareto-optimal front to be approximated, one obtains mea- 
sures similar in spirit to the generational distance, proposed by Van Veldhuizen 
and Lamont d2Dnn)). 

Unlike curve-estimators, the performance of multiobjective optimisers is 
additionally affected by the variability of the outcomes within a set and by 
how uniformly the outcomes are distributed along the final trade-off surface 
(IZitzler, 1999| |Zitzler et ai, 19^ |Van Veldhuizen and Lamont, 2000|) . Hence, 
taking into account the overall point-set character of the outcomes promises 
to be much more informative than just relying on summary measures such as 
the above. Random closed set theory | |Matheron, 1975||Kendall, 1974| ) addresses 
precisely this issue. Note that the outcome-set X is closed. 

The mean of a random-set distribution has been defined in various set- valued 
ways. One of the most popular is the Aumann-mean, which is defined as “the 
set of expected selections, where a selection is any random vector that almost 
surely belongs to the random set” jCressie, 1993| p. 751). A possible estimator 
for this mean of some (general) random closed set W is formulated as 

w„ = -(Wi©W2©...©>V„), 
n 

which is the Minkowski average of n independent copies Wi , . . . , Wn of W 
( ICressie, 19951 P- 751). Note that the Minkowski addition of two sets Ai and A 2 
is defined as 



Ai © A2 — {ui + 02 I fli G Ai, 02 € A2}. 

Clearly, the estimated Aumann-mean of the outcome set A of a multiobjec- 
tive optimiser contains many more elements than the observed sets themselves 
(see Figure Q). In addition, the theoretical mean is typically a convex set, and 
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Fig. 1. The estimated Aumann-mean for two sets of non-dominated points in 



does not contain exclusively non-dominated elements. As for the vector-mean, 
some elements might even be located beyond Pareto-optimal front to be approx- 
imated, if it is concave. The Aumann-mean is therefore unsuitable as a measure 
of location in an optimisation context. 

An alternative (empirical) mean-formula appears to be more useful. It is the 
empirical covering function, which is defined for a (general) random-set W as 

1 " 

Pn{z) = eWi}, z e (1) 

i=\ 

The random sets Wi , . . . , W„ are independently and identically distributed like 
W, and /{•} denotes the indicator function. The empirical covering function has 
been applied in the area of “Particle Statistics” to describe the average of possi- 
bly non-convex particles. Note that particles must be transformed into sets first, 
by choosing “reasonable locations and orientations” for them flSto^n^^998|^^ 
The attainment function and its empirical estimator 
[Fonseca and Fleming, 19^ turn out to be equivalent to the theoretical cov- 
ering function p{z) = P{z € W) and its empirical counterpart. The definition 
of the attainment function and additional theoretical results are given in the 
following section. 
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3 The Attainment Fnnction 

3.1 Definition, Interpretation, and Estimation 

The attainment function provides a description of the distribution of an outcome 
set X = {Xj G j = 1, . . . , M} in a simple and elegant way, using the notion 
of goal-attainment. It is defined by the function ax{-) '■ > [0, 1] with 



ax{z) = P{Xi < z V X 2 < z V ... V Xm < z) 

= P{X < z). 

The symbol “V” denotes the logical “or”. The expression ax{z) corresponds 
to the probability of at least one element of X being smaller than or equal to 
z G that is, the probability of an optimiser finding at least one solution 
which attains the goal-vector z in a single run. Clearly, the attainment function 
is a generalisation of the multivariate cumulative distribution function Fx{z) = 
P{X < z). It reduces to the latter when M = 1, i.e. when the optimiser produces 
only one random objective vector per optimisation run. 

The attainment function simultaneously addresses the three criteria of solu- 
tion quality in the multiobjective context pointed out by Zitzler and colleagues 
(IZitzler, 1999| [Zitzler et al., 19^ , although not separately: a long tail (in the 
multidimensional sense) away from the true Pareto front may be due to the loca- 
tion of individual outcome elements in some runs (first criterion), to the lack of 
uniformity of the elements within runs (second criterion), or to the small extent 
of the outcome non-dominated sets (third criterion) . 

The attainment function can be estimated via its empirical counterpart 

1 " 

an{z) = - ^ /{a; < z}, 

i=l 

the empirical attainment function, where the random sets Xi, . . . ,Xn correspond 
to the outcomes of n independent runs of the optimiser. Note the similarity to 
the empirical covering function m- 



3.2 The Link to Random Closed Set Theory 

The attainment function can be written in terms of so called “hit-or-miss proba- 
bilities” , which are of fundamental importance in random closed set theory. For 
this, an alternative representation of the outcome set X = {Xj G j = 
1,... , M} with equivalent stochastic behaviour is chosen. It is the random 
(closed) set 



y = {y G \ Xi < y V X 2 <y V ... V Xm < y} 
= {ye^^\x<y} 



( 2 ) 
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Fig. 2. Outcome set X with non-dominated realizations x\,xi^ and xz and the set y 
(here as a realization). 



describing the region in which is attained by X (see Figure |21). Using this 
alternative representation of X, the attainment function may be expressed as 

ax{z) = P{zGy), z€R^. 

Hence, the attainment function of the outcome-set X is identical to the covering 
function of the associated random set y. Denoting n independent copies of the 
random set Ji’ as , . . . , one can rewrite the empirical attainment function 
as 



1 

an{z) = ~y^ I{z G yi\, 

2—1 

which shows the identity between the empirical attainment function of X and 
the empirical covering function of y (compare with (Q). 

Finally, the (theoretical) attainment function of X is identical to the hitting 
function or capacity functional (see e.g. Cressie (Cnni, Goutsias CnnEI)) of y 
with support restricted to the collection of all one-point sets {z} in Hence, 
it can be expressed via hit-or-miss probabilities as 

ax{z) = P{yn{z}^^i)). (3) 

In general, the hitting function is defined over all compact subsets K in (a 
definition for spaces more general than is not of interest here) . It fully charac- 
terises the stochastic behaviour of a random closed set in R*^, and is of essential 
importance in random closed set theory. Note that the attainment function does 
not contain enough information to uniquely describe the stochastic behaviour of 
df and of y. 
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3.3 First-Order Moment Concepts 

The hitting function of a (general) random closed set W defined over all compact 
subsets K in is identical to the general first-order moment measure 
of the same set, i.e. 



c^(if) = p(wnif 7^0). 

The above definition generalises the notion of first-order moment of a binary 
random field {b{z) \ z G R'^}, which is a collection of random function values 
b(z) where b(z) can be 0 or 1. Here, the first-order moment (measure) is defined 
as 



F(b(z) = 1) = F(yvn{z} yf 0) 

where the random closed set W is related to the binary random field according 
to W = {z S R“^ I b(z) = 1}. See Goutsias (119981b 

As its formulation in m shows, the attainment function ax{-) is the first- 
order moment measure of the binary random field derived from the random set 
3^ in (|21) so that 



3^= {z G R'^ I b{z) = 1}. 

In other words, the attainment function ax{z) is the expected value of the binary 
random variable I{yr\{z} yf 0} = I{X < z} for all goals z G R*^. Hence, it makes 
sense to see the attainment function as a mean-measure for the set-distribution 
of y and, in the wider sense, also of the outcome-set X . Note again that the 
empirical covering function is used as a mean-measure in particle statistics. 

As remarked above, the attainment function reduces to the (multivariate) 
distribution function Fx{-) for singular sets X = {X}. The distribution function 
is a mean-measure for the distribution of the random set IV = {y G R'^ | A < y} 
and, in a wider sense, also of X = {A}. Thus, Fx{-) is a suitable alternative 
for the mean-vector of single objective vectors as a measure of location, when 
the optimisation problem is multiobjective and a whole Pareto-front is to be 
approximated (see the discussion in 12 . 211 . 

A notion of bias may be constructed in terms of the difference between the 
attainment function ax{z) associated with the optimisation outcomes and the 
ideal attainment function a/(z) = I{z G 3^*}, where 3^* denotes the deterministic 
region attained by the true Pareto-optimal set of the problem. The bias, in this 
sense, is a function of a goal z, and indicates how far from ideal the optimiser is 
regarding the attainment of that goal. 



3.4 Higher-Order Moment Concepts 

When the first-order moment does not fully characterise a distribution, higher- 
order moments can contribute with additional information about the stochastic 
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behaviour of a random variable/vector/set. Depending on the actual distribu- 
tion, a finite number of higher-order moments may, or may not, be enough to 
fully characterise it. In statistics, this problem is known as the problem of mo- 
ments | |Mood et al., ~Wl p. 81). 

The attainment function, as mentioned before, does not uniquely determine 
the underlying set-distribution of y (and of Af). In fact, it just addresses one 
aspect of optimise!' performance, which is location-closeness. Closeness of the 
approximations in terms of spread (variability across runs) could be described 
by the variance (second centred moment). The second-order moment (measure) 
of 3^ 









= P[{X<Zi) A {X<Z2)] 



(originally defined for the binary random field associated with y, see above) 
describes the probability of hitting two goals z\ and Z 2 simultaneously. Together 
with the first-order moment ^^e attainment function, it can be used 

to explain the dependence structure between the two binary random variables 
I{y n {zi} yf 0} and I{y C {^ 2 } 0}- The difference 



Cf ({zi}, {Z 2 }) - 44^1}) • 44^2}) 

= P[{x < Zi) A (df < Z2)] - ax{zi) ■ ax{z2) 

can be seen as a form of covariance. If it equals zero, the two random variables are 
uncorrelated. On the other hand, if the event of attaining a goal Z\ is independent 
from the event of attaining the goal Z 2 then the difference is zero (compare with 
Goutsias ilTMa b Dependencies between more than two goals can be explored 
through higher-order moments of y. Eventually one can hope to completely 
characterise the distribution of the outcome-set X (through J^). 

Setting zi = Z 2 = z, one obtains 

444, {4) - 4^({4) • 444) = - «1^(4 = 

which is simply the variance of the binary random variable I{y R {z} yf 0} = 
I{X < z} for all z G R.'^. The corresponding empirical estimator would be 

n 

l^n{z) = - X! “ 4^ G yi})^, 

which is rather similar to the variance estimator defined by Stoyan for 

particle data. 



4 Computational Issues 

The practical usefulness of the attainment function as a measure of multiobjec- 
tive optimiser performance is tied to the ability to estimate it from experimental 
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data. The computation of the empirical attainment function (EAF) in arbitrary 
dimensions (i.e., number of objectives) is related to the computation of the mul- 
tivariate empirical cumulative distribution function (ECDF), but computing the 
multivariate ECDF efficiently is not considered an easy task, either (see Justel 
et al. (CM3)- In fact, whereas the univariate ECDF exhibits discontinuities at 
the data points only, the multivariate ECDF exhibits discontinuities at the data 
points and at other points, the coordinates of which are combinations of the 
coordinates of the data points. As the number of dimensions increases, the num- 
ber of points needed to describe the ECDF (and the EAF) may easily become 
too large to store. Storing all relevant points may not always be necessary, how- 
ever. The maximum difference between two EAFs, for example, can be computed 
without that requirement. 

Similar considerations apply, to an even greater extent, to the estimation of 
the second-order moments. Work in this area is currently in progress. 



5 Conclusions and Future Perspectives 

The performance assessment of stochastic (multiobjective) optimisers was dis- 
cussed in the light of existing criteria for the performance of classical statistical 
estimators, and theoretical foundations for the attainment function were estab- 
lished within the field known as random closed set theory. 

The outcomes of multiobjective optimisers are random point sets in de- 
noted by X. Alternatively, they can be represented by (continuous) random 
closed sets 3^ of a particular type with equivalent stochastic behaviour. Consid- 
ering minimisation problems, the sets y are unbounded towards -l-oo in every 
dimension, and are bounded below by the elements of X . 

The attainment function of an outcome set A is a first-order moment measure 
of the corresponding set y, defined over all possible one-point sets in (the gen- 
eral moment measure is defined over all compact subsets in R'^). Comparing the 
performance assessment of optimisers with that of statistical estimators showed 
that the attainment function is a kind of mean measure of the outcome-set X . 
As such, it does indeed address a very sensible aspect of the stochastic behaviour 
of the optimiser, i.e. the location of the approximation. A suitable definition of 
bias was also suggested, which allows the location of the approximation to be 
seen with respect to the unknown Pareto- front. 

The attainment function is a generalisation of the (multivariate) cumulative 
distribution function to the case of random non-dominated point sets. Thus, 
also the cumulative distribution function can be seen as a mean-measure for 
the set y describing the region in R^^ which is attained by a single objective 
vector A. In a wider sense, the cumulative distribution function can be seen as a 
mean- measure of {A} itself. Regarding the empirical attainment function, it is 
hoped that it preserves some of the good properties of the empirical cumulative 
distribution function. Also, the attainment function makes it possible to compare 
the performance of multiobjective optimisers regardless of whether they produce 
one or many objective vectors per run! 
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The attainment function does not fully characterise the distribution of the 
random sets X or y. Extensions of the attainment function based on higher- 
order moment concepts were introduced which could contribute with additional 
information. They might eventually lead to the full characterisation of the distri- 
butions considered. This perspective gives the attainment function an advantage 
over performance measures such as the volume measure of the attained region 
y, which is related, for example, to the “size of the dominated space” in Zitzler 
( II p. 43f). In Matheron’s (II 1)7, 'ijl theory, the distribution of a random closed 
set is characterised by hit-or-miss events (on which the attainment function is 
based) and “not by measures or contents” Qbtoyan, 19981 ). 

The results presented here are mainly of probabilistic nature. They are needed 
to support inferential methodology such as the test procedure for the compar- 
ison of optimiser performance used in Shaw et al. (nrmnt . which is based on 
the maximum difference between two empirical attainment functions and on a 
permutation argument (see Good (fiUODf b Unlike the methodology proposed by 
Knowles and Come ([2DUDp , such a test does not depend on auxiliary lines or suf- 
fer from multiple testing issues (see also Fonseca and Fleming (ll99Kll b To a great 
extent, inferential methodology which truly exploits the attainment function and 
related concepts has yet to be developed. 

Finally, the solution-quality view of optimiser performance could be combined 
with the run-time perspective by considering time an additional objective to be 
minimised. The outcome of an optimisation run would then be the set of non- 
dominated objective- vectors, augmented with time, evaluated during the run. 
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Abstract. Many real-world scientific and engineering applications in- 
volve finding innovative solutions to “hard” Multiobjective Optimiza- 
tion Problems (MOP). Various Multiobjective Evolutionary Algorithms 
(MOEA) have been developed to obtain MOP Pareto solutions. A par- 
ticular exciting MOEA is the MOMGA which is an extension of the 
single-objective building block (BB) based messy Genetic Algorithm. 
The intent of this discussion is to illustrate that modifications made to 
the Multi-Objective messy GA (MOMGA) have further improved its effi- 
ciency resulting in the MOMGA-II. The MOMGA-II uses a probabilistic 
BB approach to initializing the population referred to as Probabilistically 
Gomplete Initialization. This has the effect of improving the efficiency 
of the MOMGA through the reduction of computational bottle-necks. 
Similar statistical results have been obtained using the MOMGA-II as 
compared to the results of the original MOMGA as well as those obtained 
by other MOEAs as tested with standard generic MOP test suites. 



1 Introduction 



We have developed an innovative Multiobjective Evolutionary Algorithm 
(MOEA), the Multi-Objective messy GA (MOMGA) that takes a novel approach 
to solving Multiobjective Optimization Problems (MOPs) [1 t2l,3l4j . Part of the 
novel approach of the MOMGA is its extension of the existing single-objective 
building block-based messy Genetic Algorithm (mGA) to the MOP domain []] 
O]. Building Blocks (BBs) define genes of chromosomes and contain the infor- 
mation that the EA is attempting to combine, evaluate and move towards the 
Pareto Front as extending the unproven Building Block Hypothesis jSl • The BB 
approach is used in the MOMGA to increase the number of “good” BBs that 
are present in each subsequent generation. These “good” BBs represent good 
material present in the current genotype population to be further exploited and 
used to move towards optimal solutions of a specific multiobjective problem. The 
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associated test suite indicates that the MOMGA is as good, if not better, than 
other MOEA approaches for unconstrained numerical problems in a generic test 
suite. 

As MOEAs have developed, studies of associated data structures and opera- 
tors (recombination, mutation, selection) have reflected a variety of approaches 
yielding good results across a variety of MOPs. In general such MOEAs must 

1) provide for an effective and efficient search towards the Pareto front, and 

2) provide for a “uniform” or diverse distribution of points (finite) defining a 
calculated known pareto front m Contemporary MOEA examples include 
the NSGA, NSGA-II, MOGA, NPGA, PAES, and the SPEA, which indirectly 
manipulate BBs, and the MOMGA and the MOMGA-II which explicitly and di- 
rectly manipulate building blocks |llbl7l8| In general, the successful evolution of 
all these MOEAs required the employment of more complex selection operators, 
such as niching, crowding, sharing, elitist, ranking, tournament or thresholding 
to achieve the above MOP Pareto front criteria. Since most implicit BB MOEAs 
are flowing to similar algorithmic architectures (operators, data structures), one 
should observe that they should have similar performance for appropriate param- 
eter values. Of course, the NFL theorem implies that development of a generic 
robust MOEA will be quite difficult. 

Depending on the particular MOP, various MOEAs have different statistical 
performance as expected across a variety of parameter settings and evaluation 
metrics m- Numerical unconstrained and constrained MOP test suites are 
considered here m- It is the intent of this paper then to indicate that the 
MOMGA-II is an excellent MOEA for solving MOPs with various characteristics 
and should be considered for a multitude of real-world applications. The basics 
of the MOMGA and MOMGA-II, appropriate MOP test suites, and evaluation 
metrics are discussed. Statistical analysis of the MOMGA, MOMGA-II, and 
other MOEAs using specific test suite numerical functions is presented with 
various conclusions indicated. 



2 Multiobjective Problems 

Although single-objective optimization problems may have a unique optimal 
solution, MOPs usually have a possibly uncountable set of solutions, which when 
evaluated produce vectors whose components represent trade-offs in decision 
space. 

Pareto optimal solutions are those solutions within the search space whose 
corresponding objective vector components cannot be all simultaneously im- 
proved. These solutions are also termed non-inferior, admissible, or efficient 
solutions. Their corresponding vectors are termed nondominated] selecting a 
vector (s) from this vector set implicitly indicates acceptable Pareto optimal so- 
lutions (genotypes). These solutions may have no clearly apparent relationship 
besides their membership in the Pareto optimal set. It is simply the set of all so- 
lutions whose associated vectors are nondominated; we stress that these solutions 
are classified as such based on their phenotypical expression. Their expression 
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(the nondominated vectors), when plotted in criterion (phenotype) space, is 
known as the Pareto front. 

A MOEA’s structure can easily lead to confusion when identifying Pareto 
components. During MOEA execution, a “local” set of Pareto optimal solutions 
(with respect to the current MOEA generational population) is determined at 
each EA generation and termed Peurrentit), where t represents the generation 
number. Many MOEA implementations also use a secondary population, storing 
all/some Pareto optimal solutions found through the generations |1I9| . Because a 
solution’s classification as Pareto optimal depends upon the context within which 
it is evaluated (i.e., the given set of which it’s a member), corresponding vectors 
of this set must be (periodically) tested, removing solutions whose associated 
vectors are dominated. 

We term this secondary population Pknownif). This term is also annotated 
with t (representing completion of t generations) to reflect its possible changes 
in membership during MOEA execution. Pknown{Q) is defined as 0 and Pknown 
alone as the final, overall set of Pareto optimal solutions returned by an MOEA. 

Different secondary population storage strategies exist; the simplest is when 
Pcurrent(.t^ is added at each generation (i.e., Peurrentiff) Pknown(.t !))■ 
any given time, Pknownit) is thus the set of Pareto optimal solutions yet found 
by the MOEA through generation t. Of course, the true Pareto optimal solution 
set (termed Ptme) is not explicitly known for problems of any difficulty. Ptme 
is defined by the functions composing an MOP; it is fixed and does not change. 
Because of the manner in which Pareto optimality is defined Peurrent (f) is always 
a non-empty solution set . 

Pcurrent{t), Pknown, and Ptrue are sets of MOEA genotypes where each set’s 
phenotypes form a Pareto front. We term the associated Pareto front for each 
of these solution sets as PFcurrent{t), PFknown, and PEtrue- Thus, when using 
an MOEA to solve MOPs, the implicit assumption is that one of the following 

holds. Pknown — Ptrue, Pknown O Ptrue, P Pknown ^ [P Ptrue, P Ptrue “t“ c] Over 

some norm (Euclidean, RMS, etc.). 



3 MOMGA 

The MOMGA implements a deterministic process to produce the enumeration 
of all possible BBs, of a specified size, for the initial population. This process 
is referred to as Partially Enumerative Initialization (PEI). Thus, the MOMGA 
explicitly utilizes these building blocks in combination to attempt to solve for the 
optimal solutions in multiobjective problems. While finding the optimal solution 
is never guaranteed, the MOMGA statistically finds optimal or near optimal 
solutions to the functions presented in our standard MOP test suite jil2i;i] . The 
pseudocode for the MOMGA is presented in Figure Q 

The original messy GA consists of three distinct phases: Initialization Phase, 
Primordial Phase, Juxtapositional Phase. In the initialization phase, the messy 
GA begins by producing all building blocks of a specified size through a deter- 
ministic process referred to as Partially Enumerative Initialization (PEI). The 
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For n = 1 to A: 

Perform Partially Enumerative Initialization 

Evaluate Each Population Member’s Fitness (w.r.t. k Templates) 

/ / Primordial Phase 

For i = 1 to Maximum Number of Primordial Generations 
Perform Tournament Thresholding Selection 
If (Appropriate Number of Generations Accomplished) 

Then Reduce Population Size 
Endif 
End Loop 

/ / Juxtapositional Phase 

For i = 1 to Maximum Number of Juxtapositional Generations 
Gut-and-Splice 

Evaluate Each Population Member’s Fitness (w.r.t. k Templates) 
Perform Tournament Thresholding Selection and Fitness Sharing 

Pknown i^i') — Pcurrent (^l'} U Pknown{i 1) 

End Loop 

Update k Gompetitive Templates (Using Best Value Known in Each Objective) 
End Loop 



Fig. 1. MOMGA Pseudocode 



population grows exponentially as the BB size, k, is increased Q • The MOMGA- 
II’s population size initially is smaller and grows at a smaller rate. This was one 
of the factors in developing the MOMGA-II. To evaluate the fitness of a BB, a 
competitive template is used to fill in the unspecified bits prior to evaluation. 
This ensures that each evaluation is of a fully specified string, through the BB 
alone or in conjunction with the competitive template if necessary. 

The primordial phase performs tournament selection on the population and 
reduces the population size if necessary. In the juxtapositional phase, the messy 
GA proceeds by building up the population through the use of the cut and 
splice recombination operator. The cut and splice recombination operator is used 
with the tournament thresholding selection operator. The process continues for 
a number of generations to yield strings with high fitness values for each fitness 
function. The combination of these three phases produces one era jbim] . The 
algorithm continues for the user specified number of eras. Observe that an epoch 
refers to the number of times that the phases are executed for the same set of 
eras. 

The MOMGA is an extension of the mGA to the multiobjective arena. The 
MOMGA proceeds through the three phases of the mGA to constitute an era. 
The MOMGA incorporates competitive templates for each objective function. 
The templates are initially created randomly and following each era, the best 
found individual in the population, for each objective function, becomes the new 
competitive template for that objective function, and is used in the following era. 
More detailed discussion of the MOMGA architecture is presented in m 
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4 MOMGA-II 

We modified the MOMGA to use a probabilistic approach in initializing the pop- 
ulation. This is referred to as Probabilistically Complete Initialization (PCI) |3- 
The probabilistic BB approach initializes the population by creating a controlled 
number of BB clones of a specified size. These BBs then are filtered, through a 
Building Block Filtering (BBF) phase, to probabilistically ensure that all of the 
desired BBs are in the initial population. This approach should effectively reduce 
the computational bottlenecks encountered with PEI by reducing the size of the 
initial population required to obtain “good” statistical results when compared 
with the mCA. The pseudocode for the MOMCA-II is presented in Figure Q 



For n = 1 to A: 

Perform Probabilistically Complete Initialization 

Evaluate Each Population Member’s Eitness (w.r.t. k Templates) 

/ / Building Block Filtering Phase 

Eor i = 1 to Maximum Number of Building Block Filtering Generations 
If (Building Block Filtering Required Based Off of Input Schedule) 

Then Perform Building Block Filtering 
Else 

Perform Tournament Thresholding Selection 
Endif 
End Loop 

/ / Juxtapositional Phase 

Eor i = 1 to Maximum Number of Juxtapositional Generations 
Cut-and-Splice 

Evaluate Each Population Member’s Fitness (w.r.t. k Templates) 
Perform Tournament Thresholding Selection and Fitness Sharing 

Pknown i^i') — Peurrent (^t'j U Pknown{i 1) 

End Loop 

Update k Competitive Templates (Using Best Value Known in Each Objective) 
End Loop 



Fig. 2. MOMGA-II Pseudocode 



The fast-messy CA consists of the following phases: Initialization Phase, 
Building Block Filtering, Juxtapositional Phase. The fmCA differs from the mGA 
in the Initialization and Primordial phase, which is referred to as the Building 
Block Filtering phase. The initialization phase utilizes PCI instead of the PEI 
implementation used in the mCA. The initial population is created randomly. 

The BBF phase reduces the number of building blocks and stores the best 
building blocks found. This filtering is accomplished through a schedule consist- 
ing of the random deletion of bits in each of the chromosomes throughout the 
input schedule. The schedule specifies the generations to conduct the random 
deletion, the number of specified bits to delete from the chromosomes and the 
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number of juxtapositional generations to execute. The random deletion of bits 
is alternated with tournament selection between the building blocks that have 
been found to yield a population of “good” building blocks. This complex pa- 
rameter selection is a very critical element of the MOMGA and MOMGA-II. A 
more detailed fmGA discussion of the PGI implementation and the theoretical 
analysis of the schedule are presented in 

The juxtapositional phase proceeds in the same manner as it did in the 
MOMGA. During this phase the building blocks that have been found through 
the Initialization phase and the BBF phase are recombined through the use of a 
cut and splice operator alternated with tournament selection with thresholding. 
This process builds longer population individuals towards becoming fully spec- 
ified while keeping the best members through the selection operator. Again the 
fitness evaluation is completed on fully specified individuals, through the use of 
the competitive templates if necessary. 

To evolve the MOMGA-II, the original MOMGA code is modified in the 
initialization and primordial phases to reflect the fmGA implementation. The 
non trivial modifications that were made, only changed the Initialization and 
Primordial phases of the MOMGA, noting that code optimization was not ad- 
dressed. This is an important fact since the MOMGA-II is statistically shown to 
be a more eflicient algorithm than the MOMGA, while re-utilizing much of the 
same code as the original MOMGA. In terms of the competitive template, the 
MOMGA-II operates in the same fashion as the MOMGA. Each of the strings 
in the population are evaluated with respect to the different functions that one 
is attempting to optimize. The different competitive templates are used to fully 
specify individuals that have unspecified bits. In the case of overspedfication, 
where a gene location has multiple allele values, we use a left to right scheme 
upon which the first value encountered to specify the gene becomes the allele 
value for that gene. 



5 Test Suite MOPs 

In comparing the performance of numerous MOEAs with different strengths and 
weaknesses, one needs to utilize a variety of multiobjective problems and met- 
rics suitable to make common comparison. MOP test suites can be constructed 
from known functions and generated functions included in a variety of classes 
(unconstrained, constrained, NP Gomplete, ...). The main goal of our extensive 
test suite construction, experiments and analysis is to compare MOMGA and 
MOMGA-II effectiveness and efficiency with a variety of well designed MOEAs. 

A number of appropriate unconstrained numerical MOPs were meticulously 
selected as part of our test suite of functions to be able to make a statisti- 
cal comparison between the selected MOEAs |Q. These MOPs exhibit varying 
complexity levels and Pareto front (phenotype) characteristics such as concave, 
convexity, connected, non connected, scalable, uniformity and non uniformity of 
points. Thus, each of the MOPs in our standard test suite were very carefully 
selected from a large variety of well known MOPs. Note that such functional char- 
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acteristics have also been applied to the decision variable domain (genotype) . We 
concentrate on five of the MOPs in a standard test suite which represent both 
minimization and maximization functions as well as a variety of the previous 
characteristics. Two objective problems are used here for ease of presentation 
and to provide critical insight to MOEA performance. The five MOPs selected 
are Schaffer’s labeled as MOPl, Fonseca’s # 2 labeled as MOP2, Polini’s 
labeled as MOPS, Kursawe’s labeled as MOP4, and Deb’s labeled as MOP6 



MOPl 

Minimize 

Subject to 

MOP2 

Minimize 

Subject to 
MOPS 
Maximize 

Subject to 



MOP4 

Minimize 

Subject to 
MOP6 
Minimize 

Subject to 



{PtrueConnected, P FtmeConvex) 
fi{x) = x^ 
f 2 (x) = (x-2f 

-10® < X < 10® 

{PtrueConnected, scalable, PFtrueConcave, uniform) 
n 

fi{x) = 1 - exp(- - ~^)^), 

n 

f 2 {x) = 1 - exp(- ^(Xi + ~^)^) 

-4<x^<4; i = 1,2,3 

{Pfruedisconnected, P Ftmedisconnected — 2, nonuniform) 

fi{xj y) = ~[1 + (^1 ~ Bi)^ + (^2 — ^ 2 )^], 

/2(x,y) = -[(x + 3)2 + (2/ + l)2] 

-3.1416 <x,y< 3.1416 
Ai = 0.5 sin 1 — 2 cos 1 + sin 2—1.5 cos 2 
A 2 = 1.5 sin 1 — cos 1 + 2 sin 2 — 0.5 cos 2 
Bi = 0.5 sin X — 2 cos x + sin y — 1.5 cos y 
B 2 = 1.5 sin X — cos x + 2 sin y — 0.5 cos y 

{Ptruedisconnected, scalableP Ftmedisconnected — 3, nonuniform) 

n—1 

f^{x) = ^(-10e(-°-2)V"?+^?+i)^ 

n 

f2{x) = ^(|xi|°-® + 5sin{xi)^) 

i=l 

—5 < Xi < 5; i = 1, 2, 3 

{Ftmedisconnected, P Ftmedisconnected — 4, scalable, nonuniform) 
fi{x,y) = X, 
f 2 {x,y) = (1 + lOy) * 

0 < x,y < I 
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Utilization of a test suite is advantageous to the EA community in the fact that 
it presents data that is base lined from a standard test suite m Another philo- 
sophical development of a test suite reflects similar functionality [B]. Note that 
their “generated” test functions 1,6, and 3 have quite similar characteristics to 
our MOPl, MOPS and MOPS 4 & 6 respectively. MOEAs all attempt to find 
the global maximum or minimum curve or surface, depending on the optimiza- 
tion criteria, through the use of a Pareto Front (PE) analysis. In many cases 
the true global optimum is not found by a MOEA, however, a“good” analysis of 
the PFfe„ou,„ values are necessary to determine if they are close enough to the 
PFtrue values to be satisfactory solutions to a MOP p. 

In each of the five MOPs tested, the PFtme was determined through a to- 
tal fine-resolution grid enumeration of the search space. A program was con- 
structed and run on powerful high performance computer systems to find Ptme 
and PFtrue for each of the five MOPs tested in this paper. This results in a total 
enumeration of the MOPs at a given computational resolution. 

A constrained MOP, MOP-Cl, from our constrained numerical test suite 
(MOP-Cl, MOP-C2, MOP-C3) was also employed PJ. These three MOPs are 
Binh’s second labeled as MOP-Cl, Osyczka’s second labeled as MOP-C2 and 
Viennet’s fourth labeled as MOP-C3. Each of these MOPs were carefully se- 
lected for their respective attributes and constraints from an extensive set of 
known functions. They are currently being tested. Ptme and PFtme for all these 
constrained MOPs are presented in |p. MOP-Cl formulation is as follows: 

MOP-Cl {PtmeConnected, P FtmeConvex) 

Minimize fi{x,y) = 4x^ + 

f 2 {x,y) = {x-5f + {y-5f 
Subject to 0 < a; < 5, 0 < y < 3, 

0> {x-5f + y^ -25, 

0> -{x-8f -{y + 3f + 7.7 



6 Performance Metrics 

In order to quantitatively compare our MOEA performance with others, statis- 
tical analysis of the experimental results and associated observations are pre- 
sented. Since no single metric can represent total performance of an MOEA, 
a series of appropriate metrics is used to measure the performance in the phe- 
notype domain m- MOEA metrics should consist of comparisons of 1 ) the 
PFknown values to the PFtme values to calculate generational distances, 2) 
spacing along the PFknown Pareto Front, and 3 ) the range of values for each 
objective function [3l6j . Genotype metrics can also be considered including over- 
all nondominated vector generation measure The three main metrics that 
we use to compare the various MOEAs are generational distance, spacing and 
the overall nondominated vector generation. Of these three, the only metric that 
requires one to know PFtme is generational distance. This is important since 
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the motivation of the development of these metrics was to create a set of metrics 
that can be applied to real-world problems where PFtme is not known. More 
formal mathematical definitions can be found in [3- 

Generational Distance represents how “far” PF^nown is from PFtme and is 
defined as: 

fV" iFV/p 

, ( 1 ) 

where n is the number of vectors in PFknowm we choose p = 2 (RMS), and di is 
the Euclidean distance (in objective space) between each vector and the nearest 
member of PFtme |I]' 

The spacing metric measures the range (distance) variance of neighboring 
vectors in PF^nown- This metric is defined as: 






\ 



n — 1 



J2(d-d,r, 



( 2 ) 



where d^ = min^d fl(x) - f({x) \ + \ f^{x) - f^{x) |), i,j = 1, . . . , n, d is the 
mean of all di, and n is the number of vectors in PF^nown [Q- 

The overall nondominated vector generation (ONVG) metric measures the 
total number of nondominated vectors found during MOEA execution and is 
defined as: 

ONVG ^1 PFknon,n \ ■ (3) 

Although other metrics such as error ratio, max error, hyperarea ratio, 
progress measure, and nondominated vector ratios have been employed for this 
standard MOP test suite, the three metrics (G, S, ONVG) used are quantita- 
tively quite adequate for statistical MOEA comparison m- Such metrics can 
provide insight to more subtle MOEA aspects for given MOPs. 

Note that the discretization of the continuous decision variable domain can 
be critical in finding the MOP Pareto front. That is, if only a very few decision 
variable values relate to a particular objective function optima, the statistical 
chance of generating these values is small assuming a uniform distribution of 
values. The resulting PF^nown could thus be a considerable relative distance from 
PFtrue resulting in a unacceptable solution. Some say that this is a deception 
problem, but in reality, it is a numerical analysis problem! 

One also should observe that some authors use straight line approximations 
to the continuous Pareto front if PFtme is functionally unknown in generating 
metric values 0. A better method which we employ for MOP-Gl is to use high- 
order polynomial approximations minimizing the error over the discrete PFtme 
segments which gives more realistic results. 



7 Results and Statistical Analysis 

Relative comparison performance data for a set of MOEAs and MOPs can be 
presented in a number of formats. They include such presentations as individual 
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metric bars with average, median, standard deviation, max and min or tables 
listing explicit values of these terms, compact scatter diagrams, or Pareto fronts 
and Pareto solutions. In any case, the use of explicit hypothesis testing should 
reflect the qualitative evaluation of such comparative presentations Q ■ 

Since MOEA observations are not normally distributed, parametric mean 
comparisons are not possible and therefore, non-parametric statistical techniques 
are employed. The Kruskal- Wallis H-Test is thus appropriate to use to determine 
if the hypothesis that the set of MOEA probability distribution results for a spe- 
cific MOP differ with an observed significance level of 0.1. This technique then 
leads to comparing only two MOEAs; the hypothesis that the probability distri- 
butions of two MOEA results applied to a MOP differ uses the Wilcoxon rank 
sum test with a significance level of 0.2. Details of assumptions and requirements 
for this type of hypothesis testing can be found in m For the various metrics, 
MOEA comparisons are presented using this hypothesis testing technique. 

MOMGA-II effectiveness is compared with the results of the original 
MOMGA, as well as some other well know MOEAs, for the numerical uncon- 
strained test function suite and one of our constrained test suite functions. In 
our previous research on the MOMGA, these MOEA test suites were utilized to 
allow for absolute comparisons of different MOEA approaches HE0 

The results presented here reflect the output of each MOEA using the default 
values that have been specified in P . All of the MOMGA and MOMGA-II results 
are taken over 10 data runs in order to provide enough samples for statistical 
comparison. Each of these runs consist of 3 eras, 20 generations in each of the first 
2 eras and 1 generation in the last era. The MOMGA and MOMGA-II utilize a 
string length of 24 bits and use a binary representation for the chromosomes. The 
MOMGA-II differs from the MOMGA by the BBF schedule that dictates the 
number of generations that BBF must take place. Where the MOMGA used the 
mGA calculation to determine the number of primordial generations to execute, 
the MOMGA-II always executes 9 generations of BBF and tournament selection, 
which leaves 11 generations of the juxtapositional phase. The MOMGA on the 
other hand may execute as many as 19 generations of the juxtapositional phase. 
This is an important fact to note since the creation of the solution strings takes 
place through the cut and splice operation of the juxtapositional phase. We 
show that the MOMGA-II obtains statistically similar results to the MOMGA 
with much fewer juxtapositional generations. This leads us to state that the 
MOMGA-II is finding “good” building blocks that the MOMGA finds, with the 
MOMGA-II finding them in less execution time. 

Table [H presents the execution time of the MOMGA and MOMGA-II across 
the various MOPs tested. As can be seen, the MOMGA-II is much more efficient 
than the MOMGA on the tested MOPs, with a 97% or higher efficiency improve- 
ment on all but one MOP over the MOMGA. The reason is that as the size of 
the BBs increase, the MOMGA must spend more time than the MOMGA-II in 
the creation and evaluation of the BBs. The MOMGA-II does not create every 
BB but probabilistically creates “good” BBs through the randomly deletion of 
bits from the population members until the correct BB size (string length) is 



236 



J.B. Zydallis, D.A. Van Veldhuizen, and G.B. Lament 



obtained. The results from Figures 00 and 0 show that the MOMGA-II is able 
to find “good” BBs since the MOMGA-II statistically obtains similar results to 
the MOMGA over the tested MOPs. 



Table 1. Timing and Efficiency 



MOMGA 


Multiobjective 


Maximum 


Minimum 


Median 


Average 


Standard 


Problem 


Time 


Time 


Time 


Time 


Deviation 


MOPl 


44.60 


1.75 


2.80 


7.09 


13.23 


MOP2 


168.66 


2.43 


60.09 


64.84 


58.14 


MOPS 


193.79 


41.49 


55.51 


80.39 


59.85 


MOP4 


198.29 


3.03 


180.26 


125.03 


83.86 


MOP6 


241.84 


1.69 


112.27 


120.38 


113.32 


MOMGA-II 


Multiobjective 


Maximum 


Minimum 


Median 


Average 


Standard 


Problem 


Time 


Time 


Time 


Time 


Deviation 


MOPl 


1.25 


1.06 


1.12 


1.14 


0.07 


MOP2 


1.52 


1.38 


1.43 


1.44 


0.04 


MOPS 


1.49 


1.31 


1.42 


1.41 


0.05 


MOP4 


1.43 


1.28 


1.36 


1.35 


0.05 


MOP6 


1.17 


1.02 


1.13 


1.11 


0.05 




MOPl 


MOP2 


MOPS 


MOP4 


MOP6 


Efficiency % 


0.6011 


0.9762 


0.9745 


0.9925 


0.9900 



Figure 0 visually illustrates the algorithms performance by plotting each 
of the PF known vectors found by the MOMGA and MOMGA-II. This figure 
shows us that the MOMGA-II has found points on the Pareto Front. Since the 
algorithm was executed with a non-optimized BBF schedule, the results that we 
have obtained are promising. The MOMGA-II has found the Pareto Front in 
less time and with fewer building blocks than the MOMGA. What this and the 
further results we present here illustrate is that the ideas of PGI and BBF can 
be extended to the MOP Domain. 

In the following figures, (Figures 0 0 andEJ, the bars represent the mean 
metric performance, /r, over 10 data runs with the standard deviation error 
bars shown. Figure 0 illustrates the comparison of the results that five different 
MOEAs have obtained over the test suite of five MOPs. This figure presents the 
results of the various MOEAs over the Generational Distance metric. The pair- 
wise Wilcoxon rank sum test indicates that the MOGA, MOMGA, MOMGA-II, 
and NPGA generate better results that the NSGA. We center our discussion 
on the comparison of the MOMGA with the MOMGA-II since other compar- 
isons were made previously in [ I I2j . With the Wilcoxon rank sum test, MOMGA 
and MOMGA-II generational distance results are statistically different although 
both generate good solutions. 
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fcnomn Comparison 



Figure 0 presents the overall MOEA performance with regards to the spac- 
ing metric. The rank sum test indicates that NSGA and the MOMGA-II are 
statistically worse than the others including the MOMGA. This was anticipated 
since the MOMGA-II by construction may have fewer PFknown points. The last 
metric figure also reflects this phenomena. Figure ^presents the overall MOEA 
performance with respect to the ONVG metric. With the rank sum test, the 
MOMGA-II obtains statistically better results than the NSGA but somewhat 
worse than the others including the MOMGA. 

The results of the MOMGA and MOMGA-II as applied to our constrained 
test function are presented in Table El Values indicate that the MOMGA and 
MOMGA-II obtain statistically the same results for the Generational Distance. 
Also the rank sum test implies similar distributions using this metric. This is 
an important fact since this metric is a measure of how close the algorithm has 
come to PFtrue- The other two metrics, spacing and ONVG show the MOMGA 
obtaining statistically different and better results with the rank sum test. As 
stated before this is due to the increased number of juxtapositional phases that 
the MOMGA executes, allowing it to And more points along the front than the 
MOMGA-II. 

The MOMGA-II also performs “statistically better” than the NSGA-II, 
PAES, and SPEA on MOPs 2, 3, and 4 in terms of the generational distance 
metric 0. The NSGA-II obtains larger mean values than the MOMGA-II does 
as presented in |^. Note that the NSGA-II results are based on a variant of 
the spacing metric using absolute distance; i.e., p = 1 not 2 and thus weighting 
larger error values (less than one) higher. 

The goal was to statistically compare the results obtained from the MOMGA- 
II to those obtained from the MOMGA. Figures0,El and El illustrate the results 
that the five MOEAs achieved versus the three metrics utilized. The MOMGA-II 
tends to be as effective as the original MOMGA although the Wilcoxon rank sum 
test indicates statistical differences with regard to the a fore mentioned numerical 
MOPs using the above metrics. An efficiency improvement was anticipated in 
fewer individual fitness calculations due to the PGI modification. 



Spacing Generational Distance 
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Fig. 4. Overall Generational Distance Performance 




Fig. 5. Overall Spacing Performance 



8 Conclusions 



Through initial testing, the MOMGA-II has shown to be more efficient and 
has minor statistical differences when compared to its predecessor using our 
test suite of sufficiently complex MOPs p. This is as expected since we have 



A Statistical Comparison of Multiobjective Evolutionary Algorithms 239 




Fig. 6. Overall ONVG Performance 



Table 2. Constrained MOP 



Cardinality 


MOEA 


Maximum 


Minimum 


Median 


Average 


Stand Dev 


MOMGA 

MOMGA-II 


1823.00 

443.00 


805.00 

347.00 


1529.00 

396.50 


1418.90 

398.00 


346.60 

32.73 


Spacing 


MOEA 


Maximum 


Minimum 


Median 


Average 


Stand Dev 


MOMGA 

MOMGA-II 


0.2825 

0.8666 


0.1150 

0.5392 


0.1509 

0.7182 


0.1717 

0.7007 


0.0600 

0.1048 


Generational Distance 


MOEA 


Maximum 


Minimum 


Median 


Average 


Stand Dev 


MOMGA 

MOMGA-II 


0.0232 

0.0250 


0.0096 

0.0138 


0.0140 

0.0190 


0.0149 

0.0193 


0.0042 

0.0035 



moved from a PEI implementation to a PCI initialization of the population and 
have reduced the computation bottlenecks imposed by PEL Additional focus 
on input parameters may yield better performance results as this testing was 
completed utilizing generic test parameter values. MOMGA-II results indicate 
that our modifications have increased the efficiency of the explicit BB MOMGA. 
Moreover, MOMGA and MOMGA-II are statistically different, but more effec- 
tive than some other MOEAs across the test suites. Thus MOMGA-II should be 
considered as an efficient and effective MOEA for real-world applications. 

Future work will include more extensive MOP test suite statistical evaluation 
with other genotype and phenotype metrics, more detailed comparison to other 
evolving MOEAs, and inclusion of other unconstrained, constrained, generated. 
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NP Complete (knapsack, TSP, etc.) and real world MOPs in larger test suites. In 
addition to MOMGA-II parametric studies, variations of competitive template 
structures will be analyzed along with parallelization of the MOMGA-II. 
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Abstract. The paper presents a comparative experiment with four multiple 
objective evolutionary algorithms on a real life combinatorial optimization 
problem. The test problem corresponds to the design of a distribution system. 
The experiment compares performance of a Pareto ranking based multiple 
objective genetic algorithm (Pareto GA), multiple objective multiple start local 
search (MOMSLS), multiple objective genetic local search (MOGLS) and an 
extension of Pareto GA involving local search (Pareto GLS). The results of the 
experiment clearly indicate that the methods hybridizing recombination and 
local search operators by far outperform methods that use one of the operators 
alone. Furthermore, MOGLS outperforms Pareto GLS. 



1 Introduction 

Multiple objective metaheuristics, e.g. multiple objective versions of genetic 
algorithms, simulated annealing or tabu search constitu te o ne of the most active 
directions of research in multiple objective optimization |17| . Methods of this kind 
are claimed to be able to solve both continuous nonlinear optimization problems and 
large scale combinatorial optimization problems. A significant number of real life 
applications of multiple objective metaheuristics to continuous nonlinear problems 
(e.g. in engineering design) have been reported in the literature. At the same time the 
number of successful applications to real life multiple objective combinatorial 
optimization problems seems to be rather limited Cl- In our opinion, this is not due 
to the lack of potential applications but due to the difficulty of the problems. 

In this paper we describe a comparative experiment on a real life combinatorial 
optimization problem corresponding to a distribution system design. The problem was 
analyzed in a project in which the authors were involved. The problem shares some 
characteristic features with many other real life problems: 

• it involves objectives having different mathematical formulation (many test 
problems involve objectives that differ by parameters’ values but have the same 
mathematical form, see e.g.|]T5]), 

• although it has some similarities with the classic combinatorial optimization 
problems, e.g. location problems, it has a number of specific element, making the 
use of standard operational research methods unfeasible. 



* This work has been supported by KBN grant No. 8T1 1F00619. 

E. Zitzleret al. (Eds.): EMO 2001, LNCS 1993, pp. 241-255, 2001. 

© Springer- Verlag Berlin Heidelberg 2001 





242 A. Jaszkiewicz, M. Hapke, and P. Kominek 



Four multiple objective metaheuristics have been tested on this problem. We used 
Pareto ranking based multiple objective genetic algorithm proposed by Fonseca and 
Flemming (Pareto GA), multiple objective multiple start local search (MOMSL^ 
with random weight vectors, multiple objective genetic local search (MOGLS) 
and an extension of Pareto GA involving local search (Pareto GLS). Two different 
quality measures were used in this experiment - coverage of two approximations to 
the nondominated set and estimation of the expected value of weighted Chebycheff 
scalarizing function on an approximation. In addition running time of the methods 
was taken into account. All the methods used the same solution encoding, the 
recombination and local search operators and the same way of constructii ig i nitial 
solutions. All the methods were implemented with MOEALIB++ library llbll and 
shared most of the C++ code. 

The paper is organized in the following way. In the next section some basic 
definitions are given. The distribution system design problem is described in the third 
section. In the fourth section the four methods used in the experiment are briefly 
described. The fifth section describes customization of the methods to the problem. 
Design and the results of the experiment are characterized in the sixth section. The 
last section contains conclusions. 

2 Problem Statement and Basic Definitions 

The general multiple objective optimization (MOO) problem is formulated as: 

inin{fi{x)=zi,...,fj{x)=zj} (PI) 

s.t. xe D , 

where: solution x = [x[,...,X/] is a vector of decision variables, D is the set of feasible 
solutions. If each decision variable takes discrete values from a finite set the problem 
is combinatorial. 

The image of a solution x in the objective space is a point z* = [z*,..., Zy J= f(x) , 

such that zj =f.{x),j=\,..,J. 

1 2 12 1 ^ 1 2 

Point z dominates z , z >z , if \/j Zj ^ Zj and Zj < Zj for at least one j. Solution 

x' dominates x^ if the image x' of dominates the image of x^ 

A solution X G D is efficient {Pareto-optimal) if there is no x’g D that dominates 
X. Point being an image of an efficient solution is called nondominated. The set of all 
efficient solutions is called the efficient set. The image of the efficient set in the 
objective space is called the nondominated set or Pareto front. 

An approximation to the nondominated set is a set A of points (and corresponding 

solutions) such that — i3z*,z^ g A suchthat z* z^ , i.e. set A is composed of 
mutually nondominated points. 

The point z* composed of the best attainable objective function values is called the 
ideal point: 

z] = min {fj (x)| X G d] j = 1,..., J. 

The point z* composed of the worst attainable objective function values in the 
nondominated set is called the nadir point. 
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Range equalization factors [m ch. 8.4.2] are defined in the following way: 



where R. is the (approximate) range of objective j in the set N, D or A. Objective 
function values multiplied by range equalization factors are called normalized 
objective function values. 

Weighted Chebycheff scalarizing functions are defined in the following way: 

(z, z “ , a) = mjJyXj (z^. - )}= max|/lj [f j (x) - )|, 

where z“ is a reference point, A = is a weight vector such that > 0 Vj . 

Each weighted Chebycheff scalarizing function of this type has at least one global 
optimum (minimum) belonging to the set of efficient solutions. For each efficient 
solutions X there exists a weighted Chebycheff scalarizing function s such that x is 
global optimum (minimum) of s 0 ch. 14.8]. 

Linear scalarizing functions are defined in the following way: 

j j 

Each weighted Chebycheff scalarizing function of this type has at least one global 
optimum (minimum) belonging to the set of efficient solutions m ch. 14.8]. 

Weight vectors that meet the following conditions: 

are called normalized weight vectors. 



3 The Distribution System Design Problem 

An international company operating in Poland is going to completely rebuild its 
distribution system. The company entered the Polish market at the beginning of 90’ s 
and acquired two production plants located in two large Polish cities. The company 
focuses on the production and sales of cosmetics, detergents and washing articles. The 
annual turnover of the company is roughly $100 million. 85% of the turnover is 
generated at the Polish local market and the remaining 15% is an export to Eastern 
European countries. At the Polish market the company’s products are mainly sold to 
wholesalers (60% of sales) and the chains of large retailers (hipermarkets, 
supermarkets) - 20% of sales. 

The distribution system is going to be composed of up to 39 distribution centers 
(warehouses) placed is some predefined potential locations. Each distribution center 
will deliver company’s products to clients located in one or more of 49 regions of 
Poland. The regions correspond to administrative districts. The goal of the study is to 
select appropriate locations for distribution centers and assignment of regions to the 
distribution centers. The following three objectives are taken into account: 

• minimization of the total annual distribution cost, including transportation cost, 
warehousesing cost and cost of locked-up capital, 

• minimization of the worst case riding time. 




244 A. Jaszkiewicz, M. Hapke, and P. Kominek 



• minimization of the number of distribution centers (minimization of the 
complexity of the system). 

Below we present mathematical programming formulation of the problem: 



3.1 Data 

The following data are used in the model: 

I - the number of potential locations of distribution centers, 

J - the number of regions that have to be assigned to the distribution centers, 

L - the number of vehicle types used for transportation from production plants to 
distribution centers, 

DA (DB)- annual demand of region j for products of production plant A (B) in 
[pallets], 7 = l,...,y, 

TC. - average transportation cost from distribution center at location i to region j in 
[PLN/pallet], i = 

TCA., (TCB.)- average transportation cost from production plant A (B) to distribution 
center at location i by vehicle of Z-th type in [PLN/pallet], i = 1,...,/, l=\,...,L, 

TT.. - average travel time from distribution center at location i to the first customer in 
region j in [min], i = 

PHCi - cost of pallet handling in distribution center at location i, i= I,..., I, 

CRT - current average pallet rotation time in [days], i.e. the average number of days 
that a pallet spends in the current distribution system, 

CCA (CCB)- average daily cost of locked-up capital per pallet produced in production 
plant A (B) in [PLN/day], 

MSSA (MSSB)- annual cost of minimum safety stock of pallets produced in 
production plant A (B) to be maintained in each distribution center 
ML, - capacity of vehicles of Z-th type used for transportation from production plants 
to distribution centers, Z=1,...,L, 

DY - average number of working days in a year. 



3.2 Decision Variables 

Two groups of decision variables are considered: 

y, G (0, 1 }, Z = 1,...,/, equals to one if distribution center at location i is included in the 
plan, and 0 otherwise. 

x.j e (0, 1 }, Z = 1,...,/, 7 = 1,...,7, equals to one if region j is assigned to distribution 
center at location Z, and 0 otherwise. 

3.3 Constraints 

The first group of constraints assures that regions are assigned only to distribution 
centers included in the plan: 

Xij < Yi , 



i= l,...,/,7= 1,...,/. 
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The second group of constraints assures that each region is assigned to exactly one 
distribution center: 

/ 

i=i 

3.4 Objective Functions 



Three objectives are considered in the model: 

• TDC - total annual cost of distribution, 

• MRT - maximum riding time, 

• NDC - number of distribution centers. 

All the objectives are minimized. 

The first objective is defined in the following way: 

I , 

TDC = ^ minjATCA,., + ACCA,., }+ min{ATCB,; + ACCB,; }+ AP//C,- 

i=l V ' ‘ 



where ATCA., (ATCB.) denotes annual transportation cost of products from production 
plant A (B) to distribution center at location i with the use of vehicles of Z-th type, 
ACCA., (ACCB„) denotes annual cost of capital locked-up in pallets of products from 
production plant A (B) at location i with the use of vehicles of Z-th type, and APHC, 
denotes total annual cost of pallets handling in the distribution centers, and. 

Annual transportation cost takes into account transportation from production plants 
to distribution centers and transportation from distribution centers to the customers in 
the regions. It is defined in the following way: 

./ ./ 

ATCA, =TCA,Y^XyDAj +'^x,jTC^DAj . 

7=1 7=1 



Annual cost of pallets handling in the distribution centers is defined in the 
following way: 



APHCi 







J 



Annual cost of locked-up capital is defined in the following way: 





/ 




f ^ '] 


1 \ 
J 


ACCA, = 


max^ 


ji 


MSS A + ^ x,j DAj CCA DMA, 


. ^ x,j DAj CCA{CRT + DHA „ ) [ 




V 




} 


7=1 

J 



where DHA,, and DHB,, are average headways of deliveries for production plants A 
and B, respectively, with the use of vehicles of Z-th type. They are defined in the 
following way: 



DHA^ = ML, / 



>1 



\ 

IDY . 

) 
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Maximum riding time to the first customer on a route is defined in the following 
way: 

MRT = max\xijTTij\. 

Number of distribution centers is defined in the following way: 

/ 

NDC = Y,yi 

i=\ 



4 Methods Used in the Experiment 



4.1 Multiple Objective Genetic Local Search 

Genetic Local Search (GLS) is a metaheuristic algorithm that hybridizes 
recombination operators with local search, or more generally, with other local 
improvement heuristics. Other frequently used names are Memetic Algorithms or 
Hybrid Genetic Algorithms. It is quite difficult to track the single origin of GLS. To 
our knowledge, the first description of GLS was published by Ackley [^, but similar 
algorithms were developed probably completely independently by several authors. 
Heuristics based on the GLS scheme often prove to be extremely efficient in 
combinatorial optimization. In particular, methods of this type were able to improve 
the best known results on a number of wel l st udied combinatorial optimization 
problems, e.g. Travelling Salesperson Problem Graph Coloring Problem i and 
Quadratic Assignment Problem d]. Thus, in our opinion, development of multiple 
objective versions of GLS is one of the most promising directions of research. 

In this paper, we use multiple objective genetic local search (MOGLS) algorithm 
proposed by the first author in 0. In Q the algorithm was tested on multiple 
objective Travelling Salesperson Problem and outperformed MOGLS algorithm of 
Ishibuchi and Murata j^, MOGLS based on the idea of MOSA method proposed by 
Ulungu et al. |12| and Pareto ranking-based evolutionary algorithm proposed by 
Fonseca and Flemi ng. In the algorithm outperformed Strength Pareto Approach 
(SPEA) algorithm |15| and Memetic-Pareto Archive Evolution Strategy (M-PAES) 
on the 0/1 multiple objective knapsack problem. Below we describe main ideas of 
the method. 

The goal of multiple objective metaheuristics is to generate good approximations 
to the nondominated set. Of course, the best possible approximation is the whole 
nondominated set itself. Note that all weighted Chebycheff scalarizing functions have 
optima in the nondominated set and each nondominated point is an optimum of a 
weighted Chebycheff scalarizing functions (see section Thus, finding the whole 
nondominated set is equivalent to finding optima of all weighted Chebycheff 
scalarizing functions. Hence, we reformulate the goal of multiple objective 
metaheuristics as simultaneous optimization of all weighted Chebycheff scalarizing 
functions. In fact, it is enough to consider all weighted Chebycheff scalarizing 
functions with normalized weight vectors. 

MOGLS implements the idea of simultaneous optimization of all weighted 
Chebycheff scalarizing functions with normalized weight vectors by random choice 
of a scalarizing function optimized in each iteration. In other words, in each iteration, 
MOGLS tries to improve the value of a randomly selected scalarizing function. A 
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single iteration of MOGLS consists of a single recombination of a pair of solutions 
and application of heuristic that improves locally the value of the current scalarizing 
function. An approximation of the ideal point composed of the best known values on 
each objective is used as the reference point. The reference point changes (improves) 
during the run of the method. The normalized weight vectors are drawn at random 
with uniform probability distribution p{A), i.e. a distribution for which: 

V'P’c'P J p{A)dA/ Jp(AVA=y('P’)/V'('P) 

Ae'P’ AeT 

where T' and T'’ denote the set of all normalized weights and a subset of it, 
respectively; V('P) and T('P’) are Euclidean hyper-volumes of T' and 'P’, 
respectively. In other words, the probability of drawing a weight vector belonging to 
T'’ is proportional to the hyper- volume of T'’. 

In our opinion, the main reason of the good performance of the single objective 
GLS is that the method combines features of two good solutions. Offsprings produced 
in this way are good starting points for heuristic improvement. Analogously, MOGLS 
should combine features of solutions that are already good on the current scalarizing 
function. In each iteration, our MOGLS algorithm constructs a temporary population 
composed of K solutions being the best among known solutions on the current 
scalarizing function. Then, two different solutions are drawn for recombination from 
the temporary population with uniform probability. The algorithm of MOGLS is 
summarized in^ 

The outcome of MOGLS is an approximation to the nondominated set PP 
composed of all potentially nondominated points (and corresponding potentially 
Pareto-optimal solutions) generated by the method. Updating the set of potentially 
nondominated points PP with solution x consists of: 

• adding f(x) to PP if no point in PP dominates f(x), 

• removing from PP all points dominated by f(x). 

In general, the scalarizing functions with random normalized w^ht vectors should 
be applied to normalized objective function values (see section [^. The objectives 
could e.g. be normalized with the range equalization factors based on the objectives’ 
ranges in PP. This approach is used in the experiment. 

The original idea of the method was to store all generated solutions in the current 
set of solutions CS from which temporary populations are selected. Storing and 
handling all solutions would, however, be very time and memory consuming. Thus, 
set CS is organized as a queue of size KxS. In each iteration the newly generated 
solution is added to the beginning of the queue if it is better than the worst solution in 
the temporary population and different to all solutions in the temporary population. If 
the size of the queue is bigger than KxS then the last solution from the queue is 
removed. 



4.2 Multiple Start Local Search Algorithm 

This simple algorithm corresponds to the Initialization, Generation of the first 
approximation of the ideal point and Generation of the initial set of solutions phases 
of the MOGLS algorithm. At each iteration the algorithm draws at random a weight 
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vector, constructs new random solution and applies to it local search. The outcome of 
the method is the set of potentially Pareto-optimal solutions found among the local 
optima. The algorithm is denoted by MOMSLS. 

Parameters: K - size of the temporary population, optional parameter S - 
number of initial solutions, stopping criterion 
Initialization: 

The set of potentially Pareto-optimal solution PP.=0 
The current set of solutions CS:=0 
Generation of the first approximation of the ideal point: 
for each objective^ 

Construct randomly a new feasible solution x 
Improve locally objective^ starting from solution x obtaining x’ 

Add x’ to the current set of solutions CS 
Update set PP with x’ 

Generation of the initial set of solutions: 
repeat 

Draw at random a weight vector A 
Construct randomly a new feasible solution x 

Improve locally the scalarizing function s with weight vector A starting 
from solution x obtaining x’ 

Add x’ to the current set of solutions CS 
Update set PP with x’ 
until CS contains S solutions 
Main loop: 
repeat 

Draw at random a weight vector A 

From CS select K different solutions being the best on scalarizing 
function s forming temporary population TP 

Draw at random with uniform probability two solutions Xj and x^ from 
TP. 

Recombine Xj and x^ obtaining Xj 

Improve locally the scalarizing function s with weight vector A starting 
from solution Xj obtaining Xj’ 

if X3’ is better on s than the worst solution in TP and different in the 
decision space to all solutions in TP then 

Add X 3 ’ to the current set of solutions CS 
Update set PP with X 3 ’ 
until the stopping criterion is met 

Outcome: the set of potentially Pareto-optimal solutions PP. 

Fig. 1. Algorithm of multiple objective genetic local search - MOGLS. 

4.3 Pareto Ranking Based Multiple Objective Genetic Algorithm 

At present probably the most often used multiple objective metaheuristics are multiple 
objective genetic algorithms based on Pareto ranking (compare |^] ch. 3. 3. 2. 2 and 

3. 3. 2. 3 and ijT/]). Although algorithms of this kind differ in some details, they share 
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the idea of assigning solutions fitness on the basis of a ranking induced by the 
dominance relation proposed for the first time in jH In this paper we use the 
particular implementation of the idea proposed in The algorithm uses fitness 
sharing in the objective space ||^. 

The algorithm proposed in^] was extended by the use of the set of potentially 
Pareto-optimal solutions PP (secondary population) updated with each newly 
generated solution. The algorithm is denoted by Pareto GA. 



4.4 Pareto Ranking Based Multiple Objective Genetic Local Search Algorithm 

This algorithm hybridizes Pareto GA with local search. The starting population is 
composed of local optima obtained in the same way as in MOGLS and MOMSLS. In 
addition, after each recombination local search is applied to each offspring. The 
scalarizing function to be optimized is drawn at random in the same way as in 
MOGLS. The algorithm is denoted by Pareto GLS. 



5 Adaptation of the Methods to the Distribution System Design 
Problem 

The solutions are encoded by I lists of regions, i-th list contains regions assigned to 
the distribution center at location i. If the list is empty, there is no distribution center 
at this location (no minimum safety stock is maintained). 

Initial solutions are constructed by assigning each region to a distribution center at 
a randomly selected location. 

The idea of the recombination operator is to preserve important features common 
to both parents, i.e. 

• common selections of distribution centers at the same locations, 

• common assignments of regions to distribution centers at the same locations. 

The details of the recombination operator algorithm are presented in H 

Local search is used to improve the value of the weighted linear scalarizing 
function. We have tested also a version of local search that improves weighted 
Chebycheff scalarizing function but it resulted in worse performance of all the 
algorithms. Note that one of the quality measures is average value of the weighted 
Chebycheff scalarizing function. The value of the measure was, however, always 
better if the linear function was used in local search. 

Local search uses neighborhood move that shifts a region from a distribution center 
at a location to a distribution center at a different location. 

Greedy version of local search is used. It tests the neighborhood moves in random 
order and performs the first improving move. The local search algorithm is stopped 
when no improving move is found after testing all the possible neighborhood moves, 
which means that a local optimum is achieved. 

The mutation operator shifts a randomly selected region to a distribution center at a 
randomly selected potential location. 




250 A. Jaszkiewicz, M. Hapke, and P. Kominek 



for each potential location of distribution center 

if the distribution center at this location has non-empty list of assigned 
regions in both parents 

Mark the location as preliminary selected 
else if the distribution center at this location has non-empty list of 
assigned regions in one of the parents 

Mark the location as preliminary selected with probability 0.5 

for each region 

if the region is assigned in both parents to a distribution center at the 
same location 

Assign the region to the distribution center at the same location 
else /* The region is assigned to distribution centers at different 
locations */ 

if both the locations are marked as preliminary selected 

Select at random one of the locations and assign the 
region 

else if one of the locations is marked as preliminary selected 

Assign the region to the distribution center at this 
location 
else 

Mark a randomly selected location as preliminary 
selected and assign the region to the distribution 
center at this location 

Fig. 2. The algorithm of the recombination operator. 



6 Computational Experiment 

6.1 Quality Measures 

One of the quality measures used in the experiment is the coverage of two 
approximations to the nondominated set. Let A and B be two approximations. The 
coverage measure is defined as: 

B 

The value C(A, Z?) = 1 means that all points in B are dominated by or equal to 
(covered by) some points in A. The value C{A, B) = 0 means that none point in B is 
covered by any point in A. Note that in general C{A, B) t- C(B, A). This quality 
measure was used in HD and j^. 

As the second quality measure we use the estimation of the expected value of 
weighted Chebycheff scalarizing function on approximation A over the set of 
normalized weight vectors A : 

r{a) = (z**, a,a))= fi* (z°,A,a)p(a)(ZA , 

/leT 
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where A, a)= rnin|s’^(z,z^,A)| is the best value achieved by function 

i„(z,z*^,A) on approximation A. 

In order to estimate the expected value we use numerical integration, i.e. we 
calculate average value of the weighted Chebycheff scalarizing function over a large 
sample of systematically generated normalized weight vector. We use all normalized 
weight vectors in which each individual weight takes on one of the following values: 

, I = where ^ is a sampling parameter. The set of such weight vectors is 



denoted by and defined mathematically as: 

'P, ={a = Ui,...,A,]g T'U,. e {o, 1^,2^,.. .,^- 1 ^, 1 }}. 



With a combinatorial argument, we notice that this produces 
vectors. The estimation of measure R is calculated as: 



k -\- J — 1 
J-l 






weight 






The parameter k was set to 50. 

The C++ codes us ed t o calculate the two quality measures are inlcuded in the 
MOEALIB++ library (^. 



6.2 Experiment Design 

Because of confidence reasons the real life data have been slightly modified. All of 
them, however, are in realistic ranges. 

Each method has been run 10 times. MOGLS has used temporary population of 
size 30. The number of initial solutions has been set equal to 60. This value has been 
set with the approach proposed in |^. This number of initial solutions assures that on 
average 30 best solutions on a randomly chosen weighted linear scalarizing function 
are of the same quality as 30 local optima of the same function. The sets of potentially 
Pareto- optimal solutions generated in each run have been saved after 60, 120,..., 600 
recombinations. 

Pareto GLS has used population of size 60. In other words, Pareto GLS has been 
starting by generating the same number of local optima as MOGLS. The number of 
generations has been set to 11. This has assured the same number of recombinations 
as in the case of MOGLS. The sets of potentially Pareto-optimal solutions have been 
saved after the same numbers of recombinations as in the case of MOGLS. The 
mutation probability has been set equal to 0. 1 . 

MOMSLS has been performing 660 local searches, i.e. the same number of local 
searches as MOGLS and Pareto GLS. The sets of potentially Pareto-optimal solutions 
have been saved after 120, 180,..., 660 local searches, i.e. after the same number of 
local searches as in the case of MOGLS and Pareto GLS. 

Pareto GA has used a population of size 100 and has been performing 400 
generations. As this method does not use local search in performs more 
recombinations per second than Pareto GLS. The above parameters assured running 
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times comparable to MOGLS and Pareto GLS. The mutation probability has been set 
equal to 0.1. 



Table 1. Average running times of the methods. 



MOGLS 


Pareto GLS 


MOMSLS 


Pareto GA 


Number of 
recombi- 
nations 


Average 
running 
time [s] 


Number of 
recombi- 
nations 


Running 
time [s] 


Number of 
local 
searches 


Running 
time [s] 


Number of 
genera- 
tions 


Running 
time [s] 


60 


66.1 


60 


67 


120 


116.1 


40 


15.7 


120 


72 


120 


76.3 


180 


174.6 


80 


34.4 


180 


76.9 


180 


84.6 


240 


233.2 


120 


53.6 


240 


81 


240 


92.6 


300 


291.6 


160 


73.2 


300 


85 


300 


100.8 


360 


350.8 


200 


92.7 


360 


89.2 


360 


108.9 


420 


409.7 


240 


111.4 


420 


92.7 


420 


117.2 


480 


468.3 


280 


130.8 


480 


96.3 


480 


125.7 


540 


526.9 


320 


150 


540 


100 


540 


134.2 


600 


585.8 


360 


168.5 


600 


103.9 


600 


142 


660 


644.4 


400 


187.1 




------ 


- MOGLS 




- Fbreto GLS 



Fig. 3. Results of comparison with expected value of weighted Chebycheff scalarizing function. 
The chart shows + standard deviation range of the quality measure. 



6.3 Results 

[Table ll presents average running times of the methods. The dispersion of the running 
times after a given number of iterations is not given, it has been, however, very low. 
Note that MOMSLS has required much more time than MOGLS and Pareto GLS to 
perform the same number of local searches. This is caused by the fact that solutions 
obtained by recombination are much closer to local optima than random solutions. 
Thus, local search needs less iterations to achieve a local optimum. 
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It able 2 ^resents results of evaluation with expected value of weighted Chebycheff 
scalarizing quality measure. The results clearly indicate that the two methods 
hybridizing recombination and local search operators by far outperform methods that 
use one of the operators alone. In particular the comparison of Pareto GLS, Pareto GA 
and MOMSLS is a clear example of synergy of the two operators. Pareto GLS is a 
simple combination of the two latter methods and does not include any element that 
cannot be found in Pareto GA or MOMSLS. 



Table 2. Results of comparison with expected value of weighted Chebycheff scalarizing. 



1 MOGLS 1 


1 Pareto GLS | 


Number of 
recombi- 
nations 


Value of 
quality 
measure R 


Standard 
deviation of 
quality measure 
R 


Number of 
recombinations 


Value of 
quality measure 
R 


Standard 
deviation of 
quality 
measure R 


60 


0.099 


0.005 


60 


0.111 


0.006 


120 


0.089 


0.004 


120 


0.101 


0.004 


180 


0.082 


0.006 


180 


0.093 


0.007 


240 


0.078 


0.005 


240 


0.088 


0.006 


300 


0.077 


0.005 


300 


0.084 


0.006 


360 


0.075 


0.004 


360 


0.082 


0.006 


420 


0.072 


0.004 


420 


0.080 


0.004 


480 


0.070 


0.003 


480 


0.078 


0.004 


540 


0.068 


0.002 


540 


0.078 


0.004 


600 


0.067 


0.003 


600 


0.077 


0.005 


1 MOMSLS 1 


1 Pareto GA | 


Number of 
local 
searches 


Value of 
quality 
measure R 


Standard 
deviation of 
quality 
measure R 


Number of 
generations 


Value of 
quality 
measure R 


Standard 
deviation of 
quality 
measure R 


120 


0.236 


0.009 


40 


0.402 


0.023 


180 


0.233 


0.008 


80 


0.389 


0.028 


240 


0.232 


0.007 


120 


0.387 


0.027 


300 


0.229 


0.008 


160 


0.383 


0.025 


360 


0.227 


0.006 


200 


0.378 


0.024 


420 


0.227 


0.007 


240 


0.371 


0.022 


480 


0.226 


0.006 


280 


0.367 


0.023 


540 


0.226 


0.006 


320 


0.363 


0.024 


600 


0.224 


0.006 


360 


0.357 


0.020 


660 


0.224 


0.006 


400 


0.355 


0.021 



Furthermore, MOGLS outperforms Pareto GLS. It generates better approximations 
to the nondominated set in shorter time. In our opinion, this is caused by the fact that 
MOGLS selects solutions for recombination on the basis of the same scalarizing 
function that is optimized by local search. Furthermore, the solutions are selected 
from a larger set of known solutions. From the other side, MOGLS has higher 
memory requirements than Pareto GLS. illustrates performance of MOGLS and 
Pareto GLS. 
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Fig. 4. Results of the comparison with coverage measure. Each chart contains 10 box plots 
representing the distribution of C values for a certain ordered pair of algorithms. The 10 box 
plots correspond to the ten sets of potentially Pareto-optimal solutions saved by each method in 
each run (the first set on the left). The scale is 0 at the bottom and 1 at the top for each chart. 
Chart in row of algorithm A1 and column of algorithm A2 presents values of coverage of 
approximations generated A2 by approximations generated by Al. The charts show + standard 
deviation range of the quality measure. 

0 presents results of the comparison with coverage measure. The results indicate 
the same ranking of the four methods as the evaluation with expected value of 
weighted Chehycheff scalarizing quality measure. Note, that no methods is able to 
produce approximations to nondominated set covering 100% of approximations 
generated by other methods. This indicates that no method is able to find all 
Pareto-optimal solutions. 



7 Conclusions 



Four multiple objective evolutionary algorithms have been tested on a real life 
combinatorial optimization problem corresponding to the distribution system design. 
The results of the computational experiment indicate that: 
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• hybridization of recombination and local search operators may boost 
performance of multiple objective evolutionary algorithms, 

• the choice of solutions for recombination has crucial influence on the 
performance of genetic local search algorithm. 
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Abstract. A new method of constraint handling for multi-objective 
Pareto optimization is proposed. The method is compared to an approach 
in which each constraint function is treated as a separate objective in a 
Pareto optimization. The new method reduces the dimensionality of the 
optimization problem by representing the constraint violations by a sin- 
gle ’’infeasibility objective”. The performance of the method is examined 
using two constrained multi-objective test problems. It is shown that the 
method results in solutions that are equivalent to the constrained Pareto 
optimal solutions for the true objective functions. It is also concluded 
that the reduction in dimensionality of the problem results in a more 
transparent set of solutions. The method retains elegance of the under- 
lying Pareto optimization and does not preclude the representation of a 
constraint as an objective function where this is considered important. 
The method is easily implemented and has no parameters to be tuned. 



1 Introduction 

The aim of multi-objective optimization is to provide a set of non-dominated 
solutions that minimize several often conflicting criteria. Many multi-objective 
optimization problems are also constrained so that any optimization algorithm 
applied to solve them must ensure that the set of optimum solutions satisfies all 
constraints. Among other approaches (reviewed in [2j), two common approaches 
to handling constraints have been adopted for use with Pareto based ranking 
methods. First, the penalization of the rank of infeasible individuals, and second 
the transformation of the constraints to objectives. 

Narayanan and Azarm |Q penalized the rank of infeasible individuals using an 
exterior penalty function, the penalty only being applied to the non-dominated 
but infeasible solutions. This approach has the disadvantage that the severity of 
the penalty has to be carefully chosen, especially during the early generations 
of a highly constrained problem (too severe a penalty may prevent any feasible 
solutions from being found for highly constrained problems). 

Fonseca and Fleming 0 overcame the difficulty of choosing a penalty weight 
by treating the constraints as objectives in the Pareto ranking. The approach 
has the advantage that there are no additional parameters to be chosen in order 
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to operate the algorithm with constraints. Under some circumstances, it also has 
the advantage that the optimum relationship between the true criteria and the 
constraints can be analyzed. However, for problems having many constraints, the 
interrelationship between the criteria and constraints is difficult, if not impossible 
to interpret. It is also possible that the increase in the number of objectives (and 
dimension of the problem), will demand an increase in population size required 
to give a good representation of the non-dominated solutions. As a consequence 
of the increased dimensionality and population size, the time required to reach 
convergence may also increase. 

This paper examines an approach that combines the constraint violations 
to give a single measure of an individual’s infeasibility. The infeasibility is then 
treated as an objective in the Pareto ranking of the solutions. The approach has 
the advantage that it retains the elegance of the Fonseca and Fleming approach, 
but that the number of objectives is reduced which makes the interrelationship 
between them easier to interpret. The rationale applied in this approach is that 
the constraints are considered to be bounds on the problem and that the in- 
terrelationship between constraints and objectives is of no interest. Where the 
impact of a constraint on the objectives is considered important, then the con- 
straint may still be represented as objective (with any remaining constraints 
combined to produce an infeasibility objective). 

1.1 The Multi-objective Genetic Algorithm 

The genetic algorithm (GA) used in this study is a simple binary encoded GA 
with ’’roulette wheel” selection, single point cross-over and a non-overlapping 
population. The fitness assignment is based on an exponentially weighted inverse 
of the Pareto rank of each individual |2|. A goal attainment method |2|, has also 
been implemented so that the search can be directed towards a particular region 
of the solution space (notably, the feasible region). The method penalizes the 
Pareto rank of an individual according to the extent of its goal violation. Finally, 
a phenotypic sharing function is used to ensure a reasonable spread of solutions 
is obtained 0| 0. 

2 The Infeasibility Objective 

The approach advocated here is that a single measure of a solutions infeasibil- 
ity can be developed and used as an objective in a Pareto optimization. Since 
it is also considered that the constraints are acting as bounds on the problem, 
it is only the solutions that lie beyond the bounds that are of concern. The 
measure of infeasibility should therefore represent both the number of active 
constraints and the extent to which each constraint is violated. A measure of 
infeasibility that has these properties is the sum of the constraint values for all 
violated constraints. This can be evaluated in three stages. First the inequality 
constraints (^^(X)), are formulated such that they are negative when feasible 
and the equality constraints (/ij(X)), zero when feasible (Equations Q and I3) . 
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Second, the feasible constraint values are reset as zero and infeasible values as 
positive (Equation . Finally, the solutions infeasibility (c(X)), is taken as the 
normalized sum of the reset constraint values (Equation^. The solutions infea- 
sibility (c(X)), is subsequently referred to as the infeasibility objective, which 
once minimized (to zero), ensures all constraints are satisfied. 



5.(X) < 0, 


{j = 1, 


(1) 


h,{X) = 0, 


U = g-k l,....,m) 


(2) 



, rv'i - / max(0,5j(X)) , if 1 < j < Q 



( 3 ) 



= E 



,(X) 



i=i 



( 4 ) 



Normalizing the constraint violations (by dividing by the scaling factor 
Cmaxj), is necessary since large differences in the magnitude of the constraint 
values can lead to dominance of the infeasibility by the constraints having the 
highest values. In the procedure implemented here, the scaling factor for each 
constraint (cmaxj), is taken as the maximum value of the constraint violation 
found in the initial randomly generated population. If no infeasible solutions 
where found, the scaling factor is set to equal unity. This has the effect that if 
in subsequent generations the search moves to a region in which the constraint 
becomes active, its effect on the solutions infeasibility will depend on the magni- 
tude of the constraint violation with constraint values much greater than unity 
having the most effect (since the normalization of the constraints tends give val- 
ues in the order of unity). This may become a problem for weakly constrained 
problems with a high degree of feasibility since a randomly generated population 
may well result in a number of unity scaling factors. However, the focus of this 
approach is on the solution of highly constrained problems that are unlikely to 
yield any feasible solutions from a randomly generated population. The scaling 
factor is static and has been taken from the initial population so that for given 
constraint violations, the magnitude of infeasibility objective is consistent in ev- 
ery generation. This allows solutions from each generation to be included in the 
Pareto ranking of subsequent generations without the need to re-evaluate the 
infeasibility objective. 

Although the infeasibility measure can be used directly as an objective func- 
tion in a Pareto optimization, it is necessary to use the goal attainment method 
0, to direct the optimization towards the feasible solutions (this is also the case 
were each constraint is treated as an individual objective) . Since all feasible 
solutions have the same infeasibility objective value (zero), the infeasibility ob- 
jective is excluded from the Pareto ranking when if it has a value of zero. This 
results in the ranking for feasible solutions being a function of only the true 
objectives, which in turn has the effect of reducing the dimensionality of the 
problem and thus makes it easier to interpret the solutions. 
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3 Example Constrained Pareto Optimization 

The infeasibility objective approach is investigated through two test problems. 
The first test problem is easily visualized and is an adaptation of an established 
multi-objective test problem. The second test problem is new and includes both 
equality and inequality constraints. The infeasibility objective approach is com- 
pared to the approach which treats all the constraints as individual objectives 

Q. 

Several methods have been proposed for the performance evaluation of multi- 
objective genetic algorithms. For instance Srinivas and Deb |^, describe an ap- 
proach to evaluating the distribution of solutions across the range of the Pareto 
solution space. Narayanan and Azarm proposed using the Euclidean distance 
of the non-dominated points to zero as a indication of the rate of convergence. 
Such measures are best used to compare the performance of different multi- 
objective genetic algorithms (MOGA), in solving the same test problem. How- 
ever, here we are in effect using the same MOGA to solve two different problems, 
one where all constraints are represented as objectives, and a transformed prob- 
lem in which the constraint violations are represented by a single infeasibility 
objective. As such, a qualitative judgement is made here as to the effectiveness 
of the infeasibility objective approach. The assessment is based on the ease with 
which the results can be interpreted, and the extent to which the infeasibility 
objective approach produces Pareto optimum solutions. 

3.1 A Four Function Test Problem 

The four function test problem presented here is an adaptation of an existing 
two objective test problem (Equations 0 and m p. A third function (Equation 
E3J , has been added to provide two inequality constraint functions (Equations 0 
and|3), giving a total of four test functions (/i(X), / 2 (X), 51 (X), 32 (X)). The 
test problem can be written as: 

Minimize: 





/i(X) = 1 - exp (a:i - 


(5) 




/ n \ 

/ 2 (X) = 1 - exp - ^ (a;, -t- l/Vnf 

V 7-1 ) 


) (6) 


Subject to: 


\ L — 1 / 




Where: 


g,(X) < 0.0, Vj 


(7) 




51 (X) = 0.4 - h (X) 


(8) 




52 (X) = h (X) - 0.8 


(9) 
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and: 



/s (X) = 1 - exp 




if 1 = 1, 3, 5... 
(xi + l/i/n)^, if 1 = 2,4, 6... 



( 10 ) 



In this example, the number of variables (n), has been fixed at 2. A discrete 
increment of 0.05 between the variable values has been chosen, and the variable 
range set at -2.0 to 2.0. This results in the function values being in the range 
0.0 to 1.0. Figure d illustrates the test problem surface, with the shaded area 
representing the constrained Pareto solution space. 



It would be expected that treating both constraints as separate objectives 
would result in solutions covering the whole of the Pareto surface. Using a pop- 
ulation size of 100 and optimizing for 100 generations, Figure 0 illustrates that 
this is the case (with the solutions indicated by a ’’box”). 

However, since the infeasibility objective is only included in the Pareto rank- 
ing for infeasible solutions, it would be expected that this approach would pro- 
duce a set of solutions that represent the constrained Pareto optimum solutions 
for only fi (X) and /2 (X) . This is represented by a line following the upper limit 
of the Pareto surface in Figured (the ’’Pareto front”). Figure 0 illustrates that 
this is the case (the solution being indicated by a ’’box” and the remainder of 
the Pareto surface by ’’circles”). 




Objective Function f2 



Objective Function f1 



Fig. 1. The Four Function Test Problem 
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Fig. 2. Pareto Solutions for Constraint Function Optimization 
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Fig. 3. Pareto Solutions for the Infeasibility Objective Approach 
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It can be seen from Figures |3 and 0 that the solutions from the infeasibility 
objective approach are a subset of the solutions obtained when the constraints 
are treated as separate objectives. However, for more complex problems, it may 
not be so easy to visualize the results and determine which set of solutions are 
equivalent to the constrained Pareto optimum solutions for the true problem 
objectives. The infeasibility objective approach and its performance is therefore 
illustrated further through a more complex test problem. 

3.2 A Six Ftinction Test Problem 

A six function test problem has been developed that has some of the characteris- 
tics of thermal system optimization problems. For such problems, there is often 
a choice to be made between the capital cost and running cost of the system 
('Equations II II and II 21) . The design of the system is normally restricted by limits 
on the fluid velocities lEauation 11511 . yet the system must also have sufflcient 
capacity to meet the dynamic load acting on it lEauations 1161 and 1171) . Finally, 
the components of the system must be physically connectable ('Equation ITHIl . 

The test problem provides two true objectives, three inequality constraints 
and one equality constraint. The first objective function (Equation II Ijl . is discon- 
tinuous and is only weakly coupled to the second objective function (Equation 
lEJ. The discontinuity in /i(X) and the weak coupling between the two objec- 
tives means that the search may have difflculty in obtaining an even distribution 
of solutions across both objective functions (with the continuous objective, /2(X) 
being biased). It would also be expected that the constraint functions are most 
active for the lower values of the first objective (/i(X)), again adding weight to 
the solutions for /2(X). 

Minimize: 



/i(X) 



(83.0 -k 14.0x1 + 119.0f -k 50.0x1 f) 2.0x3 if X2 < 1.4 

(83.0 -k 14.0xi -k 119.0x2 + 50.0x1X2) X3 else 



( 11 ) 



/2(X) 



Subject to: 



2.05(x4-k5) 



7.03 5.87(x3 - 1) 



X1X2 



X1X2 



X5 



( 12 ) 



Where: 



g, (X) < 0.0, Vj 

h, (X) = 0.0, Vj 



5i(X)= — -3.0 

X1X2 




X4 + 5.0 



X1X2X3 



(13) 

(14) 

(15) 

(16) 
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53 (X) = 15.91 -0.58x4 + exp (12.0-0.58x4) (17) 

552.0 

hi{X.) = mod(100.0x2X3, 2) (18) 

and: 



0.5 


Xi 


2.0, 


in increments of 0.05 


0.5 


X2 


2.0, 


in increments of 0.05 


1.0 


X3 


3.0, 


in increments of 1.0 


30.0 


X4 


40.0, 


in increments of 0.01 


600.0 


X5 


3600.0, 


in increments of 5.0 



Both the method of including constraints as objectives and infeasibility ob- 
jective approach were evaluated using ve di erent randomly generated popu- 
lations. The population size in each case was xed at 100 individuals and the 
optimization continued for 100 generations. There was very little variation in the 
results between the di erent initial generations for either of the two approaches. 
Figure 0 illustrates the optimum results for the di erent initial populations for 
the infeasibility objective approach. Considering the consistency of the results, 
only one set of solutions is presented in the remainder of this paper. 
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Fig. 4. Pareto Solutions for Infeasibility Objective Approach 



It is clear that for two true objective functions the infeasibility objective 
approach produces a pay-o curve rather than surface. Treating the constraints 
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as objectives necessarily increases the dimensions of the problem and results in 
a pay-off surface between the objective functions. Figure|H|illustrates this for the 
approaches, the ’’dots” being the solutions due to the constraints being treated 
as objectives, and the lines with ’’cross” points being the solutions from the 
infeasibility objective approach. Figure 0 shows that the infeasibility objective 
approach produces the Pareto optimal solutions for the two true objectives. 
The banding in solutions is in part due to the equality constraint function but 
mainly due to the discontinuity in the /i(X) objective function. This is most 
pronounced around an /i(X) value of approximately 800. The discontinuity here 
has prevented the infeasibility objective approach from producing solutions that 
extend further across the range of /i(X). For both optimization approaches, the 
bias of results is towards objective function / 2 (X), and is probably due to the 
effect of the discontinuity in fi (X) . Similar results are also evident by comparing 
the pay-off between /i(X) and the other functions (Figures El and l^l. 




0.5 1 ' ' ' ' ' 

0 500 1000 1500 2000 2500 3000 

Objective Function f1 



Fig. 5. Pareto Solutions for /i(X) and /2(X) 



Reducing the dimensionality of the problem also aids analysis of the variation 
in the variables for the Pareto set of solutions. Again, since for this example, 
the problem has been reduced to two objectives by the infeasibility approach, 
behaviour of the variables is much clearer than for the solutions from when all 
the constraints are represented as objectives (Figure Eland EJ- 

The ease with which the solutions from the infeasibility objective approach 
can be interpreted is due, in part, to the problem being reduced to only two ob- 
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Fig. 6. Pareto Solutions for the Constraint Objective Approach 
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Fig. 7. Pareto Solutions for the Infeasibility Objective Approach 
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Fig. 8. Pareto Variables for the Constraint Objective Approach 
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Fig. 9. Pareto Variables for the Infeasibility Objective Approach 
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jective functions; a higher number of objective functions necessarily resulting in 
a multi-dimensional surface of solutions that is more difficult to interpret. How- 
ever, many real world multi-objective optimization problems are only concerned 
with optimizing a few objectives, the number of constraints often far out num- 
bering the objectives. It is also clear that the infeasibility objective approach is 
an effective means of obtaining the Pareto optimal solutions for the true objec- 
tives and can be used to reduce the dimensionality of the optimization problem 
where detailed analysis of the constraint behaviour is not required. 

4 Conclusions 

This paper introduces a means by which the constraint violations may be com- 
bined to form an infeasibility objective for use in multi-objective Pareto opti- 
mization. The infeasibility objective has the properties that it is zero for feasible 
solutions, but is positive for infeasible solutions, and increases in value with both 
the number of active constraints and the magnitude of each constraint violation. 
Both inequality and equality constraints can be included in the infeasibility ob- 
jective. 

It is shown, that in comparison to an approach in which each constraint 
is treated as a separate objective, transforming the violated constraints into a 
single infeasibility objective results in Pareto solutions that are more transparent 
and easier to interpret. Using the infeasibility objective in a Pareto ranking 
optimization results in solutions that are equivalent to the constrained Pareto 
optimal solutions for only the true objective functions. This is in contrast to the 
solutions obtained when each constraint is treated as separate objective function 
which increases the dimension of the problem and limits the extent to which the 
interrelationship between objectives and constraints can be interpreted. 

The use of the infeasibility objective does not preclude the treatment of 
a constraint function as an objective function if knowledge of the constraints 
effect is considered important. The approach is intended to allow treatment of 
constraints as bounds on the problem when the only concern is that the solutions 
are feasible, the constraint values across the Pareto optimal solutions are still 
available for analysis. 

It can be concluded that the infeasibility objective approach retains the el- 
egance of the existing Pareto ranking constrained optimization, but provides a 
means by which the dimensionality of the problem can be reduced and conse- 
quently, that the Pareto optimal solutions become more transparent. The method 
also has the advantage over approaches based on the use of penalty functions in 
that the it does not require any parameters, which overcomes the need to tune 
it to a particular problem. 
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Abstract. One common characterization of how simple hill-climbing 
optimization methods can fail is that they become trapped in local op- 
tima - a state where no small modification of the current best solution 
will produce a solution that is better. This measure of ‘better’ depends on 
the performance of the solution with respect to the single objective be- 
ing optimized. In contrast, multi-objective optimization (MOO) involves 
the simultaneous optimization of a number of objectives. Accordingly, 
the multi-objective notion of ‘better’ permits consideration of solutions 
that may be superior in one objective but not in another. Intuitively, 
we may say that this gives a hill-climber in multi-objective space more 
freedom to explore and less likelihood of becoming trapped. In this pa- 
per, we investigate this intuition by comparing the performance of sim- 
ple hill-climber-style algorithms on single-objective problems and multi- 
objective versions of those same problems. Using an abstract building- 
block problem we illustrate how ‘multi-objectivizing’ a single-objective 
optimization (SOO) problem can remove local optima. Then we investi- 
gate small instances of the travelling salesman problem where additional 
objectives are defined using arbitrary sub-tours. Results indicate that 
multi-objectivization can reduce local optima and facilitate improved 
optimization in some cases. These results enlighten our intuitions about 
the nature of search in multi-objective optimization and sources of diffi- 
culty in single-objective optimization. 



1 Introduction 

One of the most general heuristics used in optimization techniques is the idea 
that the value of solutions is to some extent correlated with how similar the 
solutions are; crudely, that a good solution is more likely to be found nearby to 
other good solutions than it is to be found nearby an arbitrary solution. Natu- 
rally, ‘nearby’ or ‘similar’ needs to be qualified. The simplest notion of similarity 
of solutions is their proximity as measured in the problem parameters given. But 
alternatively, we may define proximity in terms of the variation operators used 
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by the search algorithm jjj. In any case, the simplest way to use this heuristic is 
a hill-climbing algorithm: start with some random solution, try variations of this 
solution until a better solution (or at least, non-worse solution) is found, move 
to this new solution and try variations of this, and so on. But the actual success 
of a hill-climber requires a stronger assumption to be true: that from any point 
in the solution space there is a path through neighbouring points to a global 
optimum that is monotonically increasing in valuqj. If this is true then a hill- 
climber can find a global optimum - and, although a hill-climber can do better 
than random guessing on almost all practical problems we encounter, it usually 
does not find a global optimum. More likely, it gets stuck in a local optimum - 
a sub-optimal point or plateau that has no superior neighbouring points. 

There are several approaches that can be taken to overcome the limitations 
of a simple hill-climber. Broadly, many approaches can be seen as one of the 
following: changing the neighbourhood structure of the solution space so that 
the strong assumption is true; or relaxing the strong assumption and, one way 
or another, utilizing solutions which are inferior to some extent. Changing the 
neighbourhood structure can be done by something as simple as increasing the 
neighbourhood ‘radius’ by increasing mutation, or by a complete redesign of how 
solutions are represented and new variants are created, or perhaps by adding re- 
dundancy so that a hill-climber can travel along ‘neutral networks’ [^1 to find 
superior points without having to go through inferior points. Relaxing the strong 
assumption can be done by, amongst other things, probabilistically accepting in- 
ferior solutions, as in simulated annealing, or by the use of multi-point searchers, 
or multi-restart searchers, where although one searcher may become stuck an- 
other, at a different location, may continue. 

In this paper, we investigate a different approach, similar to one previously 
proposed in HOI. Rather than changing the neighbourhood structure so that 
we can always find a superior point, or accepting search paths through inferior 
points, we use a different definition of superior and inferior. Specifically, we use a 
method of comparing two solution that is common in multi-objective optimiza- 
tion (MOO) techniques where more than one measure of a solution is provided. 
Briefly, under Pareto optimization, a solution x is superior (said to Pareto dom- 
inate) another solution x' if and only if it is at least as good as x' in all measures 
and better in at least one measure. Put another way, if x' is better than x in at 
least one measure then it is not “inferior” to x. Our intuition is this: if we can 
add other objectives to a problem to make it multi-objective, and use this re- 
laxed notion of inferiority, then we may open up monotonically increasing paths 
to the global optimum that are not available under the original single-objective 
optimization (SOO) problem. We call this approach “multi-objectivization” . 

Naturally, the effect of this transformation will depend, in part, on exactly 
how we ‘objectivize’ the problem, and the particulars of the algorithm that 
uses the new multi-objective problem space. To begin with, we illustrate the 
principle on a test function that has an obvious decomposition. We explain why 
decomposing this problem naturally leads to the removal of all local optima in 
the search space, and demonstrate this fact empirically with results showing that 
a Pareto hillclimber (PAES) can solve the problem much more efficiently than a 

^ ‘Efficient’ success also requires that this path is not exponentially long 
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hillclimber. We also compare the performance of PAES with that of simulated 
annealing on this problem, and show that increasing the freedom of movement 
for a hill climber, by decomposing a problem into multiple objectives, is more 
effective than relaxing the strong assumption for a hill-climber, as in simulated 
annealing. This illustrates an idealized multi-objectivization. 

Following this, we take the well-known travelling salesperson problem (TSP) 
as an exemplar real-world problem with a single objective (to minimize the 
tour length) and show how it may be decomposed into sub-objectives. We then 
perform a number of experiments to measure the effectiveness of decomposing 
the problem, by comparing various single-point and multi-point hill-climbers on 
the original landscape and on the multi-objective landscape. Some comparison 
with simulated annealing is also provided. Several instances of the problem are 
considered, and we attempt to establish the effect the choice of different sub- 
objectives has on the efficacy of the resultant decomposition. 

The remainder of the paper is structured as follows: Section 2 defines the 
concepts of single and multi-objective optimization, and introduces the technique 
of multi-objectivization. Section 3 defines the algorithms that we use in our 
experiments, and Section 4 defines the test problems we use both in their SOO 
and MOO versions. Section 5 describes the results of the experiments. Section 6 
discusses implications and related research, and Section 7 concludes. 

2 Single-Objective and Multi-objective Optimization 

The general (unconstrained) single-objective combinatorial optimization prob- 
lem can be expressed as: 

maximize /(x) 

subject to X G A ^ ' 

where x is a discrete solution vector, and A is a finite set of feasible solutions, 
and /(x) maps A into 5ft. 

Similarly, the multi-objective combinatorial optimization (MOCO) problem 
can be expressed as: 



“maximize” f(x) = (/i(x), . . . , /at(x)) 
subject to X G A ' 

where the vector objective function f(x) maps A into Jft'^, where A > 2 is the 
number of objectives. The term ‘maximize’ appears in quotation marks because, 
in general, there does not exist a single solution that is maximal on all objectives. 
Instead, one may seek to find a set of solutions X* C A, called the Pareto optimal 
set, with the property that: 

V X* G A* . ^ X G A . X ^ X* (3) 

where x x* ((Vi G 1..K . (/i(x) > fi{x*))A{3i G I..K . /*(x) > /*(x*))). 
The expression x x* is read as x dominates x*, and solutions in the Pareto 
optimal set are also known as efficient or admissible solutions. In addition, for two 
solutions X and x', we say x ~ x' if and only if 3 i G 1..K , /i(x) > fi{x') A 3 j G 
1..K . j^i. /j(x') > /j(x). Such a pair of solutions are said to be incomparable, 
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and each is nondominated with respect to the other. Since any member of a set 
of mutually nondominated solutions is not worse (dominated) than any other, a 
hillclimbing algorithm would be free to move from any one of them to any other 
if the variation operator allows. This is the notion that is important to allow the 
increased freedom of a Pareto hill-climber, and which may reduce the problem 
of local optima. 

2.1 Multi-objectivization 

To perform multi-objectivization we must either replace the original single objec- 
tive of a problem with a set of new objectives, or add new objectives in addition 
to the original function. In either case, we want to be sure that the global op- 
timum of the original problem is one of the points that is Pareto optimal in 
the multi-objective version of the problem. Specifically, the problem must be 
restated so as to maximize K > 2 objective functions such that the following 
relation between solutions in the two formulations holds: 

Vx°P* g . 3x* e X* . X* = x°P‘ (4) 

where x°^’* is an optimal solution to the SOO problem, and is the set of all 
such solutions, and x* and X* relate to the MOO formulation of the problem, 
and have the meanings attributed above. 

Part of the intuition that motivates multi-objectivization comes from notions 
of problem decomposition - dividing a problem into a number of smaller sub- 
problems and solving each and all of them. Accordingly, it may be appropriate 
to define an objective as a function similar to the original function but over a 
subset of the problem parameters. For example, in our TSP example we add ob- 
jectives corresponding to the length of parts of the tour. Defining functions over 
subsets of the problem parameters has clear connections with divide and con- 
quer techniques, especially dynamic programming An alternative approach 
to multi-objectivization is to define different functions over the same (entire) set 
of problem parameters. For example, to take a different domain, if we desire a 
locomotion controller for a legged robot we might suppose that it needs to both 
lift its body off the floor, and swing it legs forward, both of which may depend 
on all of the parameters. 

The skill of the researcher in either approach (similar functions over subsets, 
or different functions over the entire set) is to separate out the conflicting aspects 
of the problem - to find objectives that are as independent as possible. This is 
not always easy or possible. However, we suppose that in some circumstances 
the multi-objectivization approach may have useful tolerance of ‘lazy’ decom- 
positions, where sub-problems are not completely independent. Perhaps this is 
because a solution that is better in all objectives is preferred if available, but a 
solution that exploits one objective at the expense of another objective is still 
valuable in the case that a dominant solution cannot be found. 

In the two examples introduced later, the first uses different functions over the 
entire parameter set, and the second, TSP, uses similar functions over subsets 
of the solution. The examples serve to illustrate the different approaches to 
multi-objectivization, the issues involved, and show some cases where multi- 
objectivization can be successful. 
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3 Algorithms 

In order to test the hypothesis that multi-objectivizing a problem can reduce the 
number of local optima in a way that is useful for performing local search, we 
employ a number of simple neighbourhood search algorithms described below. 
The first pair of algorithms, consisting of a simple hillclimber (SHC), and a 
multi-objective hillclimber similar in operation to the Pareto archived evolution 
strategy (PAES) ^IH| . are both single-point hill-climbers. PAES represents the 
multi-objective analogue of SHC: both algorithms accept a neighbour of the 
current solution if it is not worse than any solution found so far. In the context 
of PAES, however, ‘worse’ means dominated. 

The second pair of algorithms, which are a mutation-only genetic algorithm 
with deterministic crowding m] (DCGA), and the Pareto envelope-based se- 
lection algorithm (PESA) j2j, are both multi-point hill-climbers (neither uses 
recombination here). Once again, they are supposed to be analogues of each 
other, subject to the differences forced upon them by the different requirements 
of single and multiple objective optimization. The analogy between them is not 
as clear as between SHC and PAES because the selection and crowding methods 
used by them is more complicated, but they have sufficiently similar operation 
to warrant comparison for our purposes here. 

The performance of the two pairs of algorithms above are also compared with 
a simulated annealing (SA) algorithm. This comparison is intended to place the 
effect of reducing the number of local optima achieved by multi-objectivization 
into a context which is familiar to the reader. SA incrementally adjusts the strict- 
ness with which it rejects moves to worse solutions, and does so very effectively; 
it thus serves as a useful comparison to the effect of multi-objectivization. 

3.1 Single-Point Hill- Climbers 



Initialization: 



Main Loop: 



Termination: 



5^0 

X £ A <— Init(x) 

R ^ B U X 
x' £ A <— Mutate(x) 
if (lnferior(x',B) / TRUE) { 
X ^ x' 

B Reduce(x' U B) } 
return Best(B) 



Fig. 1. Generic pseudocode for hill-climbing algorithms 



Pseudocode for a generic version of a hill-climbing algorithm that may be sin- 
gle or multi-objective is given in Figured The current solution vector is denoted 
by x, and B is the minimal representation of the best solutions encountered so 
far. The function Mutate(x) returns a new solution x' made by variation of x. 
For the simple, single-objective hillclimber, the functions lnferior(), Reduce() and 
Best() are very simple. The function lnferior(x', B) returns true iff there is any 
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element of the set B whose evaluation is greater than x', Reduce(i?) returns 
the set of equally maximum value elements of the set B, and Best(i3) returns 
any member of B, because they are equally good. For the PAES-based multi- 
objective hill-climber, these functions have Pareto versions that follow the same 
semantics: lnferior(x', i?) returns true iff there is any element of the set B that 
dominates x', Reduce(i3) returns the set of elements from B that are not dom- 
inated by any other member of B, and Best(B) returns the element of B that 
is maximal in the original single objective. In our experiments here, both SHC 
and PAES are terminated when the number of function evaluations reaches a 
predetermined number, nurri-evals. 

3.2 Multi-point Hill-Climbers 



Initialization: 
Main Loop: 



Termination: 



P 0 
lnit_pop(P) 

xi <— Rand_mem(P), X 2 ^ Rand_mem(P) 
x'l ^ Mutate(xi), X 2 <— Mutate(x 2 ) 
if (H(Xi,xi) + H(x2,X2) > H(Xi,X2) + H(x2,x'i)) 
Swap(x'i,X2) 
if (/(x'l) > /(xi)) 

P ^ P U x'l \ xi 

if (/(X2) > /(X2)) 

P ^ P U X 2 \ X 2 
return Best(P) 



Fig. 2. A simple form of a genetic algorithm using deterministic crowding (DCGA) as 
used in our experiments 



Pseudocode for the mutation-only, deterministic crowding genetic algorithm 
(DCGA) is given in Figure 0 The set P is the population of candidate so- 
lutions initialized randomly using lnit_pop(P). The function H(x,x') measures 
the genotypic Hamming distance between two solution vectors. At each itera- 
tion, two parents are selected from the population at random and two offspring 
are generated by mutation, one from each parent. Parents and offspring are then 
paired up so as to minimize the sum of the genotypic Hamming distance between 
them. Each offspring then replaces the parent it is paired with if it non- worse 
than that parent. 

The PESA algorithm used here has been described in detail in |2|, and pseu- 
docode for it is given in Figure 01 It has an internal population IP of size Pi, and 
an external population of nondominated solutions EP. Here it is used without 
crossover so that each generation consists of selecting Pj parents from EP and 
mutating them to produce Pi new offspring. Then, the nondominated members 
of IP are incorporated into EP. The selection operator, select() is based on 
crowding in objective space. 

The PESA and DCGA algorithms are quite different in some ways, but do 
represent analogues of each other at a high level: both algorithms have a popula- 
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Initialization: 



Main Loop: 



Termination: 



IP ^ 0, EP ^ 0 
lnit_pop(IP) 
foreach (x € IP) 

EP <— Reduce(EP U x) 
IP ^ 0 

while (|IP| < Pi) { 

X <— Select(EP) 
x' ^ Mutate(x) 

IP ^ IP U X } 
foreach (x € IP) 

EP ^ Reduce(EP U x) 
return Best(EP) 



Fig. 3. The Pareto envelope-based selection algorithm (PESA) 



tion from which parents are selected and used to produce offspring via mutation, 
and both have a mechanism for maintaining diversity in the population. However, 
in PESA, the diversity is maintained in the objective space, while with DCGA 
it is in the genotype space. With PESA, the initial size of the pool of random 
solutions is P/, and Pj solutions are generated at each step. The prevailing size 
of EP is the size of the pool from which solutions can be selected. In DCGA, 
the population size P determines the initial pool and the size of the pool from 
which solutions are selected, and 2 solutions are generated at each step. These 
differences in detail, however, should not affect the substance of the judgments 
we make about the performance of these algorithms in our experiments. 



3.3 Simulated Annealing 

The simulated annealing (SA) algorithm used is identical to the SHC, except 
for changes to the acceptance function to allow it to accept moves to worse 
solutions. The acceptance function we employ for accepting degrading moves is 
the standard exponential function: 



p (accept x') 



exp(/(x)-/(xQ) 
~ T 



( 5 ) 



where /(x) is the evaluation of the current solution x, /(x') is the evaluation 
of the neighbouring solution, x', and p (accept x') denotes the probability of 
accepting x'. 

We choose to use a simple form of simulated annealing, employing an inho- 
mogeneous, geometric cooling schedule where T is updated after every iteration 
using: 



T^aT ( 6 ) 

Following general procedures outlined in d, we attempt to set the starting 
temperature Tq so that between 40% and 60% of moves are accepted. The final 
temperature is set so that between 0.1% and 2% of moves are accepted. We can 
then calculate a using: 



a = exp 



HTf/To) \ 

nurri-evals ) 



( 7 ) 



given that we know the total number of function evaluations nurri-evals required. 
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4 Problems 

4.1 H-IFF/MH-IFF: An Abstract Illustration 

The Hierarchical-if-and-only-if function, H-IFF, is a genetic algorithm test prob- 
lem designed to model a problem with a building-block structure where the 
sub-problem that each block represents has strong interdependencies with other 
blocks. That is, unlike many existing building-block problems, the optimal solu- 
tion to any block in H-IFF is strongly dependent on how other building-blocks 
have been solved nan). 

The fitness of a string using H-IFF can be defined using the recursive function 
given below. This function interprets a string as a binary tree and recursively 
decomposes the string into left and right halves. Each resultant sub-string con- 
stitutes a building-block and confers a fitness contribution equal to its size if all 
the bits in the block have the same allele value - either all ones or all zeros. The 
fitness of the whole string is the sum of these fitness contributions for all blocks 
at all levels. 



I 1, if |B| = 1, else 

/(B) = <^ |B| -f /(Bl) + /(Br), if {yt{h = 0} or V^{6, = 1}), (8) 

[ /(Bl) - k /(B r), otherwise, 

where B is a block of bits, {6i, 62 , ... , bn}, |B| is the size of the block = n, bi 
is the ith element of B, and Br and Br are the left and right halves of B (i.e. 
Bl = {61,62 , ... , 6 „/ 2 } and Br = | 6 „/ 2 +i, . . . , 6 „}). The length of the string 
evaluated, n, must equal 2^ where p is an integer (the number of hierarchical 
levels) . 

Each of the two competing solutions to each block (all-ones and all-zeros) give 
equal fitness contributions on average. But only when neighbouring blocks match 
do they confer a bonus fitness contribution by forming a correct block at the next 
level in the hierarchy. These competing solutions and their interdependencies 
create strong epistatic linkage in H-IFF and many local optima (see FigureEIleft). 
These local optima prevent any kind of mutation based hill climber from reliably 
reaching one of the two global optima in H-IFF (all-ones or all-zeros) in time 
less than exponential in N, the number of bits in the problem m- 

Figure 4 (left) shows a particular section through the H-IFF landscape. This 
is the section through a 64-bit landscape starting from all zeros on the left and 
ending with all ones. Specifically, it shows the fitness of the strings “000 . . . 0”, 
“100 ... 0”, “110 ... 0”, ... , “111 ... 1”. This indicates the local optima in H-IFF 
and the two global optima at opposite corners of the space. 

{ 0, if |B| = 1 and 61 yf k, else 

1, if |B| = 1 and 61 = k, else 

|B| + A(Bl) + /fe(BR), if (V^{6, = k}), 

/^(Bl) + /fc(BR), otherwise. 

where /o(x) is the first objective and /i(x) is the second. 

This particular decomposition of H-IFF results in a two-objective problem in 
which there are no local optima for a multi-objective hill-climber. Figure El(right) 
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The H-IFF Landscape The MH-IFF Landscape 





number of leading Is 

Fig. 4. H-IFF and MH-IFF 



is a section through the two-objective landscape, MH-IFF, using the same strings 
as used in the section through H-IFF. We see that if neighbouring points on this 
section are worse in respect to one dimension, they are better in the other. This 
has transformed the problem into one that is now completely bit-climbable. 
That is, single bit changes can move a hill-climber from one end of the Pareto 
front to the other, including every point in between, without ever moving to a 
dominated point. (In fact, every Pareto optimal solution can be reached by bit- 
climbing from any start point.) This transformation is possible because we have 
access to the structure of the function and can thereby separate the conflicting 
sub-goals inherent in the SOO version. This serves to illustrate the mechanisms 
of multi-objectivization - but naturally, we will not have this luxury in practical 
applications, like the TSP. 



4.2 TSP/MTSP: Decomposition via Multi-objectivization 

The travelling salesperson problem (TSP) is the most well-known of all AfP-hard 
optimization problems. For a comprehensive review and comparison of methods 
used to solve it see |H|, where the problem is stated as follows: We are given a set 
C = {ci, C2, . . . , Cat} of cities and for each pair {c^, Cj} of distinct cities there is 
a distance d{ci, Cj). Our goal is to find an ordering tt of the cities that minimizes 
the quantity 

N-l 

^ ^ d(c„.(q , C^rji-l-l) ) ~t“ d(c7r( V) 5 C-Trjl) ) ■ (^^) 

i=l 

In order to multi-objectivize the TSP, we need to identify sub-problems to 
be solved. Of course, the TSP is AfP-hard for the very reason that there is no 
good decomposition of the problem, i.e. dependencies between components of the 
problem exist in most instances. However, an obvious decomposition, although 
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by no means perfect, is simply to divide the problem into two (or more) sub- 
tours, each to be minimized. This can be done in a number of ways, and some 
may be preferable to others depending on how much is known about the problem 
instance, but here we use a method that is general for the TSP class. Specifically, 
to make a two-objective formulation of the TSP, we define two distinct sub- 
tours to be minimized. The sub-tours are defined by two cities, and the problem 
becomes: 

“minimize” f(7r,a, &) = (/i(7t, a, 5), /2(7t, a, 6)) 
where /i(7r,a,&) = EL,r-i\a) 

and /2(7 t, a, b) = + 

where a and b are the two cities specified a priori, and if 7r(a) < 7 t(6) they are 
swapped. It is intended that a and b are chosen arbitrarily. Notice that the sum 
of the two objectives is the same as the quantity to be minimized in ll I i )ll . This 
ensures that the optimum of 1)1 1)11 is coincident with at least one of the Pareto 
optima of m, as required by our definition of multi-objectivizing. 

5 Results 

5.1 H-IFF/MH-IFF 

In this section, we present a comparison of hillclimbing algorithms on the H-IFF 
problem, and its multi-objectivized formulation, MH-IFF. All algorithms were 
run on a 64-bit version of the problem, for the same number of function evalua- 
tions, num_evals = 500000. Two different mutation rates Pm = ljn,2fn,n = 64 
were used in each algorithm to investigate whether mutation rate choice sub- 
stantially affected any of the algorithms. The choice of archive size for PAES 
was 100. The DCGA and PESA algorithms require population sizes to be set. 
For DCGA, P = 100, and for PESA, Pj = 10. For the simulated annealing 
algorithm, preliminary runs were performed and Tq and Tf were adjusted until 
the acceptance probabilities fell into the required ranges described in Section 0 
Table0 shows the full set of results collected on the 64-bit H-IFF problem. In 
each case, the results were gathered from 30 independent runs of the algorithm. 
For all algorithms, all best genotypically different solutions were stored off-line, 
and the tabulated results relate to these off-line results. The table clearly indi- 
cates that on H-IFF, removing the local optima through multi-objectivization 
transforms the problem from one that is very difficult for a neighbourhood 
searcher, to one that is much easier. The performance of the SOO search algo- 
rithms improves with increased mutation rate Pm, confirming that they require 
larger steps to escape the local optima in the H-IFF problem. The multi-objective 
algorithms are almost unaffected by the choice of Pm, indicating that they are 
able to reach the optima through small neighbourhood moves. These results 
demonstrate the multi-objectivization principle, clearly and can be understood 
with reference to the discussion given to the problem in Section 0 In the follow- 
ing section we examine how the multi-objectivization technique established here 
fares on a real-world problem with a much less readily decomposable structure, 
the TSP. 
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Table 1. Results of the comparison between algorithms on a 64-bit H-IFF problem. 
Two of the algorithms are multi-objective and use the MH-IFF decomposition of the 
problem, namely PAES and PESA. The other three algorithms use the H-IFF objective 
function directly. Essentially the results compare two single-point hill-climbers, SHC 
and PAES, and two multi-point hill-climbers, DCGA and PESA. In both cases, the 
multi-objective algorithm signihcantly outperforms (using any statistical test) its SOO 
counterpart. The results of the simulated annealing algorithm (SA) act as a benchmark, 
indicating the level of performance that can be achieved when escape from local optima 
is made possible on the original landscape. The columns, ‘% one’ and ‘% both’, indicate 
the percentage of the runs where respectively one of the optima and both optima were 
found over the thirty independent runs of each algorithm. Note that only PAES and 
PESA are able to find both optima. 



Algorithm 


pm 


best 


mean 


cr 


% one 


% both 


SHC 


l/n 


288 


242.13 


22.52 


0 


0 




2/11 


336 


267.47 


29.46 


0 


0 


PAES 


l/n 


448 


415.20 


51.26 


70 


47 




2/n 


448 


418.13 


50.68 


74 


43 


DCGA 


l/n 


300 


270.06 


13.80 


0 


0 




2/n 


448 


323.93 


26.54 


3 


0 


PESA 


l/n 


448 


448.00 


0.00 


100 


100 




2/n 


448 


448.00 


0.00 


100 


100 


SA 


l/n 


448 


435.20 


26.04 


80 


0 




2/n 


448 


435.20 


26.04 


80 


0 



5.2 TSP/MTSP 

We present results for a range of TSP instances of varying size and type. All 
problems are symmetric, i.e. the distance from A to B is the same as from B to A, 
where A and B are any two cities. The RAN-20 and RAN-50 problems have 20 
and 50 cities respectively, and are non-Euclidean random weight problems where 
the distance between each pair of cities is a random real number in [0,1). The 
EUC-50 and EUC-lOO are two randomly generated Euclidean problems where 
the cities are given co-ordinates in a 2-d plane, x G [0,1), y G [0,1) and the 
distance between pairs of nodes is then the Euclidean distance. The problem, 
kroBlOO, is taken from TSPLIB and is also a 100-node Euclidean TSP problem. 
The last problem, mnPeano-92, is a 92-node fractal probleir0 with a known 
optimal solution |E|. 

In each algorithm, the representation, initialization, and mutation operators 
used are identical: the representation of a tour is an N-gene permutation of the 
numbers 1..N; the initialization procedure simply generates a random permu- 
tation of N cities; and the mutation operator used is the 2-change operator, 
originally put forward in Pj. It works by selecting two non-identical cities and 
reversing the order of all the cities between (and including) them. This operator 
preserves all but two edges in the tour. 

^ We conjecture that fractal TSP problems may be particularly suitable for multi- 
objectivization because their inherently hierarchical structure suggests they may be 
amenable to decomposition. 
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On each different TSP instance, all the algorithms are run for the same 
number of function evaluations, num_evals, given for each problem in Table El 
The PESA and DCGA algorithms further require population sizes to be set. As 
before, we use the default values of P = 100 for DCGA, and P/ = 10 for PESA. 
Setting the SA parameters is carried out as before using preliminary runs to 
derive an appropriate Tq and P/. 

The choice of city pairs to be used in the multi-objective algorithms is in- 
vestigated. First, we present results (Table EJ in which - for the multi-objective 
algorithms, PAES and PESA - a single, random pair of cities was selected and 
this pair used in all runs. Later, we report the maximum deviation from these re- 
sults for other choices of city pairs. Finally, we investigate the choice of city pairs 
using the EUC-50 problem by selecting a pair of cities that are very close, and 
another pair where they are maximally distant, and repeating our experiments 
for these choices. 



Table 2. Summary TSP results. In the ‘Optimum’ column, figures given in bold font 
represent the known optimum value; other figures are estimates. For the RAN-20 prob- 
lem, the optimum is an estimate based on the fact that SA reached this value on 30 
consecutive runs, and given the small size of the problem. For the RAN-50 problem, 
the estimated figure is based on the expected limiting value of an optimal tour 0, 
and similarly for EUC-50 and EUC-lOO, the estimates are based on the formula for 
expected tour length = K\/N A with N the number of cities, A = 1.0 the area in which 
the cities are placed, and K ~ 0.7124 j^. 



Algorithm 


Problem 


num^evals 


Optimum 


Best 


Mean 


(J 


SHC 


RAN-20 


500000 


2.547394 


2.550811 


2.81 


0.14 


PAES 


RAN-20 


500000 


2.547394 


2.547394 


2.66 


0.14 


SA 


RAN-20 


500000 


2.547394 


2.547394 


2.55 


0.00 


SHC 


RAN-50 


500000 


2.0415 


2.620087 


3.09 


0.28 


PAES 


RAN-50 


500000 


2.0415 


2.259948 


2.73 


0.22 


DCGA 


RAN-50 


500000 


2.0415 


2.307587 


2.46 


0.09 


PESA 


RAN-50 


500000 


2.0415 


2.189421 


2.32 


0.28 


SA 


RAN-50 


500000 


2.0415 


2.130675 


2.30 


0.10 


SHC 


EUG-50 


500000 


5.0374 


5.904673 


6.23 


0.20 


PAES 


EUG-50 


500000 


5.0374 


5.801026 


6.03 


0.13 


DCGA 


EUG-50 


500000 


5.0374 


5.707789 


5.76 


0.05 


PESA 


EUG-50 


500000 


5.0374 


5.692169 


5.78 


0.08 


SA 


EUG-50 


500000 


5.0374 


5.692169 


5.72 


0.03 


SHG 


EUC-lOO 


2000000 


7.124 


8.143720 


8.55 


0.23 


PAES 


EUC-lOO 


2000000 


7.124 


8.028227 


8.35 


0.24 


DCGA 


EUC-lOO 


2000000 


7.124 


7.902731 


8.16 


0.14 


PESA 


EUC-lOO 


2000000 


7.124 


7.795515 


7.97 


0.10 


SA 


EUC-lOO 


2000000 


7.124 


7.853258 


7.98 


0.07 


PESA 


kroBlOO 


2000000 


22141 


22141 


22546.1 


324.2 


SA 


kroBlOO 


2000000 


22141 


22217 


22529.2 


173.0 


SHG 


mnPeano-92 


1000000 


5697.93 


5857.47 


6433.45 


197.1 


PAES 


mnPeano-92 


1000000 


5697.93 


5879.63 


6255.30 


197.6 
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From Table 0we can see that, without exception, the results of the PAES 
algorithm are superior to the SHC algorithm at a statistically significant leveO, 
over the range of problem instances in the Table. This shows that the number of 
local optima in the TSP problem, using the 2-change neighbourhood, is reduced 
by the method of multi-objectivization we have proposed. 

Although the number of local optima has successfully been reduced by multi- 
objectivization, this does not make PAES more effective than other methods for 
solving TSP. We are only using PAES to indicate the basic advantage of multi- 
objectivization over hill-climbing in a SOO problem. Compared with DCGA, 
PAES performs poorly on all problems. However, PESA, the multi-objective 
counterpart of DCGA, outperforms DCGA on all but one of the problems. PESA 
is also competitive with the SA algorithm, which has its own explicit means 
of escaping from local optima. This result adds further evidence that multi- 
objectivization can enable an algorithm to avoid local optima effectively. 

To investigate the effect of the choice of city pairs, the 50-node Euclidean 
TSP problem was used. First, a choice of city pair in which the cities were very 
close to each other and relatively far from others was selected. 30 runs were 
performed using this pair and the results were: best = 5.804719, mean = 6.10, 
and cr = 0.22. Then, with cities in opposite corners of the plane the results 
were: best = 5.818185, mean = 6.09, a = 0.16. So both are worse than the 
‘random’ choice used in the results in Table El best = 5.801026, mean = 6.03, 
a = 0.13. However, although there is some seeming dependence on city choice, 
all three of these results are better than the results of the SHC algorithm on 
this problem. Some alternative choices of city pairs were used on some of the 
other problems, too. On the random 20-node problem, three different choices of 
node pair were tried. There was a 1.5% variation in the mean over 30 runs, for 
the different node-pair choices. On the 50-node Euclidean problem, two different 
pairs of cities were chosen for the PESA algorithm. The difference in means over 
30 runs was 0.4%. On no runs did the choice of city pair affect the mean TSP 
tour length by more than 2%. 

6 Discussion 

The examples in the previous sections have illustrated different ways in which 
additional objectives can be defined for a SOO problem and that this multi- 
objectivization can facilitate improved search for neighbourhood based algo- 
rithms. We suggested earlier that a successful multi-objectivization may involve 
decomposing the original function into a number of sub-problems and that these 
sub-problems should be as independent as possible. In the H-IFF /MH-IFF exam- 
ple, we can separate the two competing components of the problem completely, 
making the problem very easy for a hill-climber. In TSP this is not so easy - 
there is no (known) method to decompose AfP-hard problems, like TSP, that cre- 
ates independent sub-problems - indeed, it is the interdependency of the problem 
components that puts them in this class. Nonetheless, we suggest that there may 
be some merit in examining further the parallels between sub-problems in SOO, 



^ Using a large-sample test of hypothesis for the difference in two population means m 
(which does not depend upon distribution), at the 95% confidence level. 
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and objectives in MOO. Specifically, if different objectives separate out different 
components of a problem then different points on the Pareto front correspond to 
solutions that are more or less specialized to different sub-problems. Our exper- 
iments support the intuition that it will be easier to discover a set of different 
specialists than it is to discover a generalist directly. In the experiments in this 
paper we used only mutation to investigate this, but if we could find some way 
to combine together different specialists from the front then perhaps it would 
facilitate discovery of generalists more directly. In a sense, this is the intuition 
behind recombination in the multi-objective Messy GA (MOMGA) IT^ . 

However, the MOMGA assumes a number of objectives given a priori, and 
like other MOO methods, it is not intended as a technique for problem decom- 
position in SOO problems. But related research suggests that it may be possible 
to automatically discover objectives relating to sub-problems, and thereby ap- 
ply a MOMGA-like algorithm to SOO problems. The Symbiogenic Evolutionary 
Adaptation Model m shares some features with the MOMGA but uses group 
evaluation of individuals to encourage them to self-organize to cover the prob- 
lem space, and automatically discover sub-problems to be used as the objectives. 
This algorithm successfully solves the H-IFF problem without the introduction 
of additional objectives provided by the researcher. 

7 Conclusions 

In this paper, we have defined a process that we call “multi-objectivization” 
whereby the scalar function of a SOO problem is replaced by a vector of func- 
tions such that the resulting MOO problem has Pareto optima which coincide 
with the optima of the original problem. We investigated the effects of this trans- 
formation, in particular, the reduction of local optima for hill-climbing style al- 
gorithms. We illustrated the effect of the approach first on an abstract building- 
block problem, H-IFF, that is trivially amenable to such a decomposition. We 
then investigated the approach further, using several small instances of the TSP, 
where decomposition is inherently difficult. We defined a multi-objectivization 
of the problem based on minimizing two sub-tours. Our results showed that this 
simple multi-objectivization does seem to reduce the effect of local optima on 
simple hill-climbing algorithms. These preliminary results, suggest that there is 
a link between the presence of local optima in SOO problems and an underly- 
ing conflict of implicit objectives, and they shed some light on the processes of 
multi-objective search. 

Acknowledgments. The authors would like to thank Anthony Bucci, Michiel 
de Jong, and the anonymous reviewers for their excellent comments and criti- 
cisms. 
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Abstract. Over the past few years, researchers have developed a num- 
ber of multi-objective evolutionary algorithms (MOEAs). Although most 
studies concentrated on solving unconstrained optimization problems, 
there exists a few studies where MOEAs have been extended to solve 
constrained optimization problems. As the constraint handling MOEAs 
gets popular, there is a need for developing test problems which can 
evaluate the algorithms well. In this paper, we review a number of test 
problems used in the literature and then suggest a set of tunable test 
problems for constraint handling. Finally, NSGA-II with an innovative 
constraint handling strategy is compared with a couple of existing algo- 
rithms in solving some of the test problems. 



1 Introduction 



Multi-objective evolutionary algorithms (MOEAs) have amply demonstrated 
the advantage of using population-based search algorithms for solving multi- 
objective optimization problems |lll4l8l5IIilll| . In multi-objective optimization 
problems of varying degrees of complexities, elitist MOEAs have demonstrated 
their abilities in converging close to the true Pareto-optimal front and in main- 
taining a diverse set of solutions. Despite all these developments, there seem to 
be not enough studies concentrating procedures for handling constraints. Con- 
straint handling is a crucial part of real-world problem solving and it is time 
that MOEA researchers focus on solving constrained multi-objective optimiza- 
tion problems. 

To evaluate any algorithm, there is a need for well understood and tunable 
test problems. Although there exists a few constraint handling procedures in the 
literature, they all have been tried on simple problems, particularly having only 
a few decision variables and having inadequately nonlinear constraints. When 
an algorithm performs well on such problems, it becomes difficult to evaluate 
the true merit of it, particularly in the context of its general problem solving. 

In this paper, we briefly outline a few constraint handling methods sug- 
gested in the literature. Thereafter, we review a few popular test problems for 
their degrees of difficulty. Next, we propose a tunable test problem generator 
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having six parameters, which can be set to obtain constrained test problems 
of desired degree of difficulty. Simulation results with NSGA-II P along with 
the constrained-domination principle and with two other constrained handling 
procedures bring out the superiority of NSGA-II in solving difficult constrained 
optimization problems. The difficulty demonstrated by the test problems will 
qualify them to be used as standard constrained multi-objective test problems 
in the years to come. 

2 Constrained Multi-objective Evolutionary Algorithms 

There exists only a few studies where an MOEA is specifically designed for 
handling constraints. Among all methods, the usual penalty function approach 
dd where a penalty proportional to the total constraint violation is added 
to all objective functions. When applying this procedure, all constraints and 
objective functions must be normalized. 

Jimenez et al. |7] suggested a procedure which carefully compares two solu- 
tions in a binary tournament selection. If one solution is feasible and other is 
not, the feasible solution is chosen. If both solutions are feasible, Horn et al.’s 0 
niched-Pareto GA is used. On the other hand, if both solutions are infeasible, 
the solution closer to the constraint boundary is more likely to be chosen. 

Ray et al. m suggested a more elaborate constraint handling technique, 
where constraint violations of all constraints are not simply summed together, 
instead a non-domination check of constraint violations is made. Based on sep- 
arate non-domination sorting of a population using objective functions alone, 
constraint violations alone, and objective function and constraint violation to- 
gether, the algorithm demands a large computational complexity. 

Recently, Deb et al. defined a constraint- domination principle, which dif- 
ferentiates infeasible from feasible solutions during the non-dominated sorting 
procedure: 

Definition 1 A solution i is said to constrained- dominate a solution j , if any 
of the following conditions is true: 

1. Solution i is feasible and solution j is not. 

2. Solutions i and j are both infeasible, but solution i has a smaller overall 
constraint violation. 

3. Solutions i and j are feasible and solution i dominates solution j. 

This requires no additional constraint-handling procedure to be used in an 
MOEA. Since the procedure is generic, the above constraint-domination princi- 
ple can be used with any unconstrained MOEAs. 

3 Past Test Problems 

In the context of multi-objective optimization, constraints may cause hindrance 
to an multi-objective EA (MOEA) to converge to the true Pareto-optimal region 
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and also may cause difficulty in maintaining a diverse set of Pareto-optimal solu- 
tions. It is intuitive that the success of an MOEA in tackling both these problems 
will largely depend on the constraint-handling technique used. However, first we 
present some test problems commonly used in the literature and then discuss 
a systematic procedure of developing a test problem generator. Veldhuizen m 
cites a number of constrained test problems used by several researchers. We 
present here three problems which are often used by researchers. 

Srinivas and Deb HH used the following function: 



SRN : 



Minimize /i(x) — 2 + {x\ — 2)^ -|- (cc 2 — 1)^ 
Minimize / 2 (x) = 9xi — {x 2 — 1)^, 

Subject to cl(x) = xl + X 2 < 225, 

c2(x) = Xi — ^X 2 -I- 10 < 0, 



-20 <xi< 20, 



-20 < X2 < 20. 



( 1 ) 



Figure H] shows the feasible decision variable space and the Pareto-optimal set. 
The Pareto-optimal solutions HH correspond to xl = —2.5, G [—14.79,2.50]. 
The feasible objective space along with the Pareto-optimal solutions are shown 
in Figure 0 The only difficulty the constraints introduce in this problem is that 




Fig. 1. Decision variable space for the test 
problem SRN. 




Fig. 2. Obiective space for the test prob- 
lem SRN. 



they eliminate some part of the unconstrained Pareto-optimal set (shown with 
a dashed line in Figure 0 ). 
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Tanaka m suggested the following two- variable problem: 



Minimize /i(x) = xi, 

Minimize / 2 (x) = X 2 , 

Subject to cl(x) = xf + X 2 — 1 — 0-1 cos ^16 arctan | 
c2(x) = (xi — 0.5)^ + {x 2 — 0.5)^ < 0.5, 



0 < a;i < 7T, 
0 < a;2 < 7T. 



> 0 , 



(2) 



The feasible decision variable space is shown in Figure 0 Since fi = xi and 




Fig. 3. The feasible decision variable and 
objective search spaces is shown for TNK. 




Fig. 4. Pareto-optimal region for the test 
problem OSY. 



/2 = X 2 , the feasible objective space is also the same as the feasible decision 
variable space. The unconstrained decision variable space consists of all solutions 
in the square 0 < (xi,X 2 ) < tt. Thus, the only unconstrained Pareto-optimal 
solution is = a ;2 = 0. However, the inclusion of the first constraint makes this 
solution infeasible. The constrained Pareto-optimal solutions lie on the boundary 
of the first constraint. Since the constraint function is periodic and the second 
constraint function must also be satisfied, not all solutions on the boundary is 
Pareto-optimal. The disconnected Pareto-optimal sets are shown in the figure 
with thick solid lines. Since the Pareto-optimal solutions lie on a non-linear 
constraint surface, an optimization algorithm may find difficulty in finding a 
spread of solutions across the entire Pareto-optimal front. 
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Osyczka and Kundu 0 used the following six-variable test problem: 

Minimize /i(x) = — (25(xi — 2)^ -|- (x 2 — 2)^ -|- (x 3 — 1)^ -I- ( 0:4 — 4)^ 

+ (X5 - 1 ) 2 ) , 

Minimize / 2 (x) = xf + X 2 + x§ + X 4 + x§ + Xg, 

Subject to ci(x) = xi -I- X 2 — 2 > 0, 

Q 5 Y . ^ 2 (x) = 6 - xi - X 2 > 0 , 

■| C 3 (x) = 2 - X 2 -I- a:i > 0, 

C 4 (x) = 2 — xi -I- 3x2 > 0, 

C 5 (x) = 4 - (X 3 - 3)2 - X 4 > 0, 
cg(x) = (X 5 - 3)2 -I- X 6 - 4 > 0, 

0 < Xi, X 2 , xe < 10, 1 < X 3 , X 5 < 5, 0 < X 4 < 6 . 

(3) 

There are six constraints, of which four are linear. Since, it is a six variable 
problem, it is difficult to show the feasible decision variable space. However, a 
careful analysis of the constraints and the objective function reveals the con- 
strained Pareto-optimal front, as shown in Figure^ The Pareto-optimal region 
is a concatenation of five regions. Every region lies on the intersection of cer- 
tain constraints. But for the entire Pareto-optimal region, x| = Xg = 0. Table E 
shows the other variable values for each of the five regions. Since the entire 



Table 1. Pareto-optimal regions for the problem OSY. 



Region 


Optimal values 


Active 

constraints 


* 

Xl 


X*2 


X3 


xl 


AB 


5 


1 


(1,... ,5) 


5 


2,4,6 


BC 


5 


1 


(1,... ,5) 


1 


2,4,6 


CD 


(4.056,... ,5) 


(xl - 2)/3 


1 


1 


4,5,6 


DE 


0 


2 


(1,... ,3.732) 


1 


1,3,6 


EF 


(0,... ,1) 


2-xl 


1 


1 


1,5,6 



Pareto-optimal region demands an MOEA population to maintain subpopula- 
tions at different intersections of constraint boundaries, it is a difficult problem 
to solve. 

4 Proposed Test Problems 

Although the above test problems (and many others that are discussed elsewhere 
m) somewhat test an MOEA’s ability to handle constrained multi-objective 
optimization problems, they have a number of difficulties: (i) they have a few 
decision variables, (ii) most objective functions and constraints are not ade- 
quately nonlinear, and (iii) they are not tunable for introducing varying degrees 
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of complexity in constrained optimization. Like in the case of tunable uncon- 
strained test problems 0, we also suggest here a number of test problems where 
the complexity of constrained search space can be controlled. The proposed test 
problems are designed to cause two different kinds of tunable difficulty to a 
multi-objective optimization algorithm: 

1. Difficulty near the Pareto-optimal front, and 

2. Difficulty in the entire search space. 

We discuss them in the following subsections. 

4.1 Difficulty in the Vicinity of the Pareto-Optimal Front 

In this test problem, constraints do not make any major portion of the search 
region infeasible, except very near to the Pareto-optimal front. Constraints are 
designed in a way so that some portion of the the unconstrained Pareto-optimal 
region becomes infeasible. This way, the overall Pareto-optimal front will con- 
stitute some part of the unconstrained Pareto-optimal region and some part of 
the constraint boundaries. In the following, we present a generic test problem: 

{ Minimize /i(x) = x\ 

Minimize / 2 (x) = ^(x) exp(-/i(x)/g(x)) 

Subject to Cj(x) = / 2 (x) - ajexp(-6j/i(x))>0, j = l,2, ... ,J 

( 4 ) 

Here, the function g(x) can be a complex multi-variable function. There are 
J inequality constraints and the parameters (aj,bj) must be chosen in a way 
so that a portion of the unconstrained Pareto-optimal region is infeasible. We 
describe a procedure to calculate (aj,bj) parameters for J constraints: 

Step 1 Set j = 0, ttj = bj = 1. Also set Z\ = 1/(J-|- 1) and x = A. 

Step 2 Calculate y = aj exp{—bjx) and aj+i = {aj + y)/2, bj^i = 
— 4 ln(y/oj+i). Increment x = x + A and j = j + 1- 
Step 3 If j < J, Go to Step 2, else Stop. 

For J = 2 constraints, the following parameter values are found: 



j 


a, 


b, 


1 

2 


0.858 

0.728 


0.541 

0.295 



Figure 0 shows the unconstrained Pareto-optimal region (in a dashed line), and 
two constraints. With the presence of both constraints, the figure demonstrates 
that about one-third portion of the original unconstrained Pareto-optimal region 
is now not feasible. The other two-third region of the constrained Pareto-optimal 
region comes from the two constraints. 

Since a part of the constraint boundary of each constraint now constitutes 
the Pareto-optimal region, a spread in Pareto-optimal solutions requires decision 
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Fig. 5. Constrained test problem CTPl with two J — 2 constraints. 



variables (x) to satisfy the inequality constraints with the equality sign. Each 
constraint is an implicit non-linear function of decision variables. Thus, it may 
be difficult to find a number of solutions on a non-linear constraint boundary. 
The complexity of the test problem can be further increased by using more 
constraints and by using a multi-modal or deceptive g function. 

Besides finding and maintaining correlated decision variables to fall on the 
several constraint boundaries, there could be other difficulties near the Pareto- 
optimal front. The constraint functions can be such that the unconstrained 
Pareto-optimal region is now infeasible and the resulting Pareto-optimal set is 
a collection of a number of discrete regions. At the extreme, the Pareto-optimal 
region can become a collection of a set of discrete solutions. Let us first present 
such a function mathematically and then describe the difficulties in each case. 

{ Minimize /i(x) = Xi 

Minimize /2(x) = g(x) (l - 
Subject to c(x) = cos(0)(/2(x) — e) — sin(0)/i(x) > 

a |sin {bir (sin(6*)(/2(x) - e) + cos{e)fi{x)Y)f . 

The decision variable xi is restricted in [0, 1] and the bounds of other variables 
depend on the chosen g{x) function. The constraint has six parameters (0, a, b, 
c, d, and e). In fact, the above problem can be used as a constrained test problem 
generator where constrained test problems with different complexities will evolve 
by simply tuning these six parameters. We demonstrate this by constructing 
different test problems, where the effect of each parameter is also discussed. 
First, we use the following parameter values: 

9=—0.2tt, a = 0.2, 6=10, c=l, d = 6, e=l. 
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The resulting feasible objective space is shown in Figure 0 It is clear from 
the figure that the unconstrained Pareto-optimal region becomes infeasible in 
the presence of the constraint. The periodic nature of the constraint boundary 
makes the Pareto-optimal region discontinuous, having a number of disconnected 
continuous regions. The task of an optimization algorithm would be to find as 
many such disconnected regions as possible. The number of disconnected regions 
can be controlled by increasing the value of the parameter b. It is also clear that 
with the increase in number of disconnected regions, an algorithm will have 
difficulty in finding representative solutions in all disconnected regions. 




Fig. 6. Constrained test problem CTP2. 




Fig. 7. Constrained test problem CTP3. 



The above problem can be made more difficult by using a small value of d, so 
that in each disconnected region there exists only one Pareto-optimal solution. 
Figure 0 shows the feasible objective space for d = 0.5 and a = 0.1 (while 
other parameters are the same as that in the previous test problem). Although 
most of the feasible search space is continuous, near the Pareto-optimal region, 
the feasible search regions are disconnected, finally each subregion leading to 
a singular feasible Pareto-optimal solution. An algorithm will face difficulty in 
finding all discrete Pareto-optimal solutions because of the changing nature from 
continuous to discontinuous feasible search space near the Pareto-optimal region. 

The problem can have a different form of complexity by increasing the value 
of parameter a, which has an effect of making the transition from continuous to 
discontinuous feasible region far away from the Pareto-optimal region. Since an 
algorithm now has to travel through a long narrow feasible tunnel in search of 
the lone Pareto-optimal solution at the end of tunnel, this problem will be more 
difficult to solve compared to the previous problem. Figure 0 shows one such 
problem with a = 0.75 and rest of the parameters same as that in the previous 
test problem. 
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Fig. 8. Constrained test problem CTP4. 




Fig. 9. Constrained test problem CTP5. 



In all the three above problems, the disconnected regions are equally dis- 
tributed in the objective space. The discrete Pareto-optimal solutions can be 
scattered non-uniformly by using c 1. Figure El shows the feasible objective 
space for a problem with c = 2. Since c > 1, more Pareto-optimal solutions 
lie towards right (higher values of /i). If, however, c < 1 is used, more Pareto- 
optimal solutions will lie towards left. For even higher number of Pareto-optimal 
solutions towards right, a larger value of c can be chosen. The difficulty will arise 
in finding all closely packed discrete Pareto-optimal solutions. 

The parameters 9 and e do not have a major role to play in terms of produc- 
ing a significant complexity. The parameter 6 controls the slope of the Pareto- 
optimal region, whereas the parameter e shifts the constraints up or down in the 
objective space. For the above problems, the Pareto-optimal solutions lie on the 
following straight line: 



(/ 2 (x) - e)cos6» = /i(x)sin6». (6) 

It is interesting to note that the location of this line is independent of other major 
parameters (a, b, c, and d). The above equation reveals that the parameter e 
denotes the intercept of this line on the /2 axis. The corresponding intercept on 
the /i axis is — e/ tan0. 

It is important here to mention that although the above test problems will 
cause difficulty in the vicinity of the Pareto-optimal region, an algorithm has 
to maintain an adequate diversity much before it comes closer to the Pareto- 
optimal region. If an algorithm approaches the Pareto-optimal region without 
much diversity, it may be too late to create diversity among population mem- 
bers, as the feasible search region in the vicinity of the Pareto-optimal region is 
disconnected. 
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4.2 Difficulty in the Entire Search Space 

Above test problems provide difficulty to an algorithm in the vicinity of the 
Pareto-optimal region. Difficulties may also come from the infeasible search re- 
gions present in the entire search space. Fortunately, the same constrained test 
problem generator can also be used here. 

Figure cni shows the feasible objective search space for the following param- 
eter values: 



6» = 0.l7T, 0 = 40, 6 = 0.5, c=l, d = 2, e = -2. 

Notice that the above parameter values are very different from that used in the 
previous section. Particularly, the parameter value of a is two-order magnitude 
larger than before. 

The objective space of this function has infeasible holes of differing widths 
towards the Pareto-optimal region. Since an algorithm has to overcome a number 
of such infeasible holes before coming to the island containing the Pareto-optimal 
front, an algorithm may face difficulty in solving this problem. Moreover, the 
unconstrained Pareto-optimal region is now not feasible. The entire constrained 
Pareto-optimal front lies on a part of the constraint boundary. (In this particular 
test problem, the Pareto-optimal region corresponds to all solutions satisfying 
1 < ((/2 — e) sin 0 -I- /i cos 0) < 2.) The difficulty can be increased further by 
widening the infeasible regions (or by using a small value of d). 





Fig. 10. Constrained test problem CTP6. Fig. 11. Constrained test problem CTP7. 



The infeasibility in the objective search space can also come along the Pareto- 
optimal region. Using the following parameter values 



0 = — O.OStt, a = 40, 6=5, c=l, d = 6, e = 0, 
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we obtain the feasible objective space, as shown in Figure This problem makes 
some portions of the unconstrained Pareto-optimal region feasible, thereby mak- 
ing a disconnected set of continuous regions. In order to find all such disconnected 
regions, an algorithm has to maintain an adequate diversity right from the be- 
ginning of a simulation run. Moreover, the algorithm also has to maintain its 
solutions feasible as it proceeds towards the Pareto-optimal region. 

Once again, the difficulty can be increased by reducing the width of the 
feasible regions by simply using a smaller value of d. A careful thought will 
reveal that this test problem is similar to CTP4, except that a large value of the 
parameter a is now used. This initiates the disconnectedness in the feasible region 
far away from the Pareto-optimal region. If it initiates away from the search 
region dictated by the variable boundaries, the entire search region becomes a 
patch of feasible and infeasible regions. 

With the above constraints, a combination of two or more effects can be 
achieved together in a problem by considering more than one such constraints. 

4.3 More Than Two Objectives 

Using the above concept, test problems having more than two objectives can 
also be developed. We modify equation 0 as follows. Using an M-dimensional 
transformation (rotational R and translational e), we compute 



The matrix R will involve (M — 1) rotation angles. Thereafter, the following 
constraint can be used: 



Here, all aj, bj, Cj, dj, and 9j are parameters that must be set to get a desired 
effect. Like before, a combination of more than one such constraints can also be 
considered. 

5 Simulation Results 

We compare NSGA-II with the constrained-domination principle, Jimenez’s al- 
gorithm and Ray et al.’s algorithms on test problems CTPl till CTP7 with 
Rastrigin’s function as the g functional. 

In all algorithms, we have used the simulated binary crossover ^ with r]c = 
20 and polynomial mutation operator ^ with rj^ = 20, exactly the same as 
that used with NSGA-II. Identical recombination and mutation operators are 
used to investigate the effect of constraint handling ability of each algorithm. 
A crossover probability of 0.9 and mutation probability of 1/n (where n is the 



f' = R-i(f-e). 



(7) 



M-l 




( 8 ) 
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number of variables) are chosen. In all problems, five decision variable^ are used. 
A population of size 100 and a maximum generation of 500 are used. For the 
Ray et al.’s algorithm, we have used a sharing parameter a share = 0.158. For 
Jimenez’s algorithm, we has used tdom = 10 and Ushare = 0.158. It is important 
to highlight that NSGA-II does not require any extra parameter setting. 

In all problems, the Jimenez’s algorithm performed poorly by not converging 
any where near the true Pareto-optimal region. Thus, we do not present the 
solutions using Jimenez’s algorithm. 

Figure El shows that both NSGA-II and Ray et al.’s algorithm are able to 
converge close to the true Pareto-optimal front on problem OSY, but NSGA-II’s 
convergence ability as well as ability to find diverse solutions are better. 




Fig. 12. Simulation results on OSY. 



Figure nrn shows the population after 500 generations with NSGA-II and Ray 
et al.’s algorithm on GTP2. NSGA-II is able to find all disconnected Pareto- 
optimal solutions, whereas Ray et al.’s algorithm could not converge well on all 
disconnected regions. 

Problem GTP3 degenerates to only one solution in each disconnected Pareto- 
optimal region. NSGA-II is able to find a solution very close to the true Pareto- 
optimal solution in each region iFigure lTHl . Ray et al.’s algorithm cannot quite 
converge near the Pareto-optimal solutions. 

As predicted, problem GTP4 caused difficulty to both algorithms. Figure El 
shows that both algorithm could not get near the true Pareto-optimal solutions. 

The non-uniformity in spacing of the Pareto-optimal solutions seems to be 
not a great difficulty to NSGA-II (Figure El, but Ray et al’s algorithm has some 
difficulty in converging to all Pareto-optimal solutions. 

^ As the simulation results show, only five variables caused enough difficulty to the 
three chosen MOEAs. 
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Fig. 13. Simulation results on CTP2. Fig. 14. Simulation results on CTP3. 





Fig. 15. Simulation results on CTP4. Fig. 16. Simulation results on CTP5. 



When the entire search space consists of infeasible patches parallel to the 
Pareto-optimal front (CTP6 in Figure ITTI) . both MOEAs are able to converge to 
the right feasible patch and finally very near to the true Pareto-optimal front. 
All feasible patches are marked with a ‘F’. Although both algorithms performed 
well in this problem, NSGA-II is able to find a better convergence as well as a 
better spread of solutions. 

However, when infeasible patches exist perpendicular to the Pareto-optimal 
front, both MOEAs had the most difficulty (Figure El- None of them can find 
solutions closer to the true Pareto-optimal front. Although Ray et al’s algorithm 
maintained a better spread of solutions, NSGA-II is able to come closer to the 
true front. We believe that this is one of the test problems where an algorithm 
must maintain a good spread of solutions from the beginning of the run and 
must also have a good converging ability. Algorithms which tend to converge 
anywhere in the Pareto-optimal front first and then work on finding a spread of 
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Fig. 17. Simulation results on CTP6. Fig. 18. Simulation results on CTP7. 



solutions will end up finding solutions in a few of the feasible patches. NSGA-II 
shows a similar behavior in this problem. 

6 Conclusion 

In this paper, we have discussed the need of having tunable test problems for 
constrained multi-objective optimization. Test problems commonly used in most 
MOEA studies are simple and the level of complexity offered by the problems 
cannot be changed. We have presented a test problem generator which has six 
controllable parameters. By setting different parameter values, we have created a 
number of difficult constrained test problems. The difficulty in solving these test 
problems has been demonstrated by applying three constraint handling MOEAs 
on a number of these test problems. Although all algorithms faced difficulty in 
solving the test problems, NSGA-II with a constrained-domination principle has 
outperformed the other two MOEAs. The study has also shown that certain 
problem features can cause more difficulty than others. 

Because of the tunable feature of the proposed test problems and demon- 
strated difficulties on a number of constrained MOEAs, we believe that these 
problems would be used as standard constrained test problems in the coming 
years of research in evolutionary multi-objective optimization. 

Acknowledgements. Authors acknowledge the support provided by Depart- 
ment of Science and technology (DST) for this study. 
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Abstract. Evolutionary algorithms are becoming increasingly valuable in 
solving large-scale, realistic engineering multiobjective optimization (MO) 
problems, which typically require consideration of conflicting and competing 
design issues. The new procedure. Constraint Method-Based Evolutionary 
Algorithm (CMEA), presented in this paper is based upon underlying concepts 
in the constraint method described in the mathematical programming literature. 
Pareto optimality is achieved implicitly via a constraint approach, and 
convergence is enhanced by using beneficial seeding of the initial population. 
CMEA is evaluated by solving two test problems reported in the multiobjective 
evolutionary algorithm (MOEA) literature. Performance comparisons based on 
quantitative metrics for accuracy, coverage, and spread are presented. CMEA 
is relatively simple to implement and incorporate into existing implementations 
of evolutionary algorithm-based optimization procedures. 



1 Introduction 

Tradeoff information in the form of a noninferior, or Pareto optimal set, of solutions 
is important in considering competing design objectives when making decisions 
associated with most engineering problems. The use of standard mathematical 
programming-based techniques, such as constraint method and weighting method 
(Cohon, 1978) for generating noninferior sets, in many practical engineering 
problems is limited because the models describing the engineering processes seldom 
fit into the restrictive function forms required by these mathematical programming 
techniques. As evolutionary algorithms offer a relatively more flexible way to 
analyze and solve realistic engineering design problems, their use in multi criterion 
decision making is becoming increasingly important. An array of multiobjective 
evolutionary algorithms (MOEAs) has been reported since the early eighties. Detailed 
summaries of the state-of-the-art in MOEA were discussed recently by Coello (1999a) 
and Van Veldhuizen and Lamont (2000), and are represented in the special issue of 
Evolutionary Computation (Vol. 8, No. 2, Summer 2000) on multi criterion 
optimization (also see Coello (1999b) for an archive of bibliography). 
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The procedure presented in this paper integrates the underlying concepts of 
traditional multiobjective optimization (MO) techniques into an evolutionary 
algorithm framework. In the conventional constraint method for MO, one objective is 
optimized while constraining all other objectives to target levels to identify a 
noninferior, or Pareto optimal, solution. This is applied iteratively to generate the 
noninferior set. Solving a number of independent single objective optimization 
models to obtain the noninferior set, however, renders these MO methods less 
attractive if obtaining each noninferior solution is computationally intensive. This 
issue is addressed in some traditional mathematical programming approaches (e.g., 
linear programming, nonlinear programming) by seeding a search for a new 
noninferior solution with a previously generated adjacent noninferior solution (i.e., 
prior basis). Since, for some class of problems, noninferior solutions adjacent in the 
decision space map to adjacent points in the objective space, this seeding approach 
reduces the number of steps in the search for each noninferior solution. 

This paper describes the development of the Constraint Method-Based 
Evolutionary Algorithm (CMEA), which explicitly uses this adjacency mapping 
property to its advantage within an MOEA framework. The underlying simple 
concepts in CMEA allow it to be applied easily within existing implementations of 
evolutionary algorithms for single objective optimization. Eirst, the general 
representation of a standard MO problem and definitions of Pareto optimality are 
presented, followed by a brief overview of existing MOEAs and their applications. 
The details of CMEA are then described, followed by its application to a couple of 
published test problems. An extensive performance comparison of CMEA and four 
commonly accepted MOEAs in solving one of the published test problems is 
presented. The noninferior solutions are also compared to those obtained via 
mathematical programming techniques. Einally, concluding remarks are made with a 
brief discussion of CMEA’s strengths and weaknesses. 



2 Background 



2.1 A Standard Multiobjective Optimization Problem 

A general multiobjective optimization problem consisting of k competing objectives 
and m constraints defined as functions of decision variable set x can be represented as 
follows: 



Maximize 


Z(x)={Z,(xy. 1=1,2, 


(1) 


Subject to 


gjx) <0 V/=l,2,...,m 


(2) 




X & X 


(3) 



where x = {x. : j = 1,2, ,n} represents the decision vector, x. is the f decision 

variable, X represents the decision space, g/x) is the i"" constraint, Z(x) is the 
multiobjective vector, and Z/x) is the t objective function. 




Constraint Method-Based Evolutionary Algorithm (CMEA) 301 



2.2 Noninferiority 

Noninferiority, which is also referred as nondominance or Pareto optimality, of a set 
of solution is formally defined as follows (Cohon, 1978): a feasible solution to a 
multiobjective problem is non-inferior if there exists no other feasible solution that 
will yield an improvement in one objective without causing a degradation in at least 
one other objective. 

Van Veldhuizen and Lamont (2000) and Zitzler et al. (2000) provide a more 
rigorous definitions of this and related MO terminology. Based on the definitions by 
Van Veldhuizen and Lamont (2000) and notations used in Equations 1-3, the 
following are defined: 

Pareto Dominance: A multiobjective vector u = is said to dominate v = 

(v,,Vj,...V(.) (denoted by m ^ v ) if and only if u is partially more than v, 

i.e.,V/e >v- a3/g [l,2,...k] : u- > v-. 

Pareto Optimality: A solution x e Z is said to be Pareto optimal with respect to X if 
and only if there exists nox s X for which v = Z(x ) dominates u = Z(x). 

Pareto Optimal Set: For a given multiobjective problem Z(x), the Pareto optimal set 
P* is a set consisting of Pareto optimal solutions. P’ is a subset of all the possible 
solutions in X. Mathematically, P’ is defined as follows: 

P’:={xeZ|-,3x eZ:Z(x)>Z(x)} (4) 

Pareto Front: The Pareto front, PF' is the set that contains the evaluated objective 
vectors of P\ Mathematically PF' is defined as: 

PF':={u=Z(x)\x&P'} (5) 



2.3 Multiobjective Evolutionary Algorithms (MOEAs) 

Many procedures have been reported since the pioneering work by Schaffer (1984, 
1985) in the area of EAs for MO optimization. Fonesca and Fleming (1993, 1995), 
Horn (1997), Coello (1999a, 1999b), and Van Veldhuizen and Lamont (2000) present 
detailed surveys and catalogues of existing MOEAs. Many different bases, e.g., 
differences in fitness and selection implementations, for higher-level classification of 
MOEAs are used in these surveys. For instance, Schaffer’s (1985) vector evaluated 
genetic algorithm (VEGA) uses a special single-objective-based preferential selection 
procedure, the method by Hajela and Lin (1992) uses an aggregated fitness function, 
and the methods by Horn et al. (1994), Srinivas and Deb (1994), Zitzler and Thiele 
(1999), and Knowles and Come (2000) use Pareto-based selection procedures to 
determine the noninferior set. Further, these techniques can be categorized by special 
operators they use, such as niching and sharing (e.g., Horn et. al., 1994; Menczer et. 
al., 2000), restrictive mating (e.g., Loughlin and Ranjithan, 1997), and elitism (e.g., 
Knowles and Come, 2000; Zitzler and Thiele, 1999). Loughlin et al. (2000a) recently 
presented an EA-based approach that addresses problems with multiple objectives 
where some may not be easily quantified or modeled. Applications discussed by 
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Hajela and Lin (1992), Ritzel et al. (1994), Cieniawski et al. (1995), Jimenez and 
Cadenas (1995), Harrell and Ranjithan (1997), Coello et al. (1998), Coello and 
Christiansen (2000), Loughlin et al. (2000b), Obayashi et al. (2000) are some 
examples of use of MOEAs in multiobjective optimization of varied realistic 
engineering problems. VEGA-Vector Evaluated Genetic Algorithm (Schaffer, 1985), 
NPGA-Niched Pareto Genetic Algorithm (Horn et al., 1994), NSGA-Non-dominated 
Sorting Genetic Algorithm (Srinivas and Deb, 1994), and SPEA-Strength Pareto 
Evolutionary Algorithm (Zitzler and Thiele, 1999) are the four selected MOEAs that 
are used to compare the performance of CMEA presented in this paper. 

3 CMEA - Constraint Method-Based Evolutionary Algorithm 

Based on the concepts of the mathematical programming-based constraint approach 
(Cohon, 1978) for generating the noninferior set, CMEA achieves Pareto optimality in 
an implicit manner by ensuring that the population migrates along the noninferior 
surface. At each iteration, the population converges to a noninferior solution by 
solving the following single objective optimization problem: 



Maximize 






(6) 


Subject to 


8i(x) < 0 


< 

II 

Jo 

2 


(7) 




z,(x) >z; 


VZ= l,2,...,k; Z^h 


(8) 




X s X 




(9) 



where, is one of the k objectives, and {1=1, 2, ...k\ ItIiJ is the constraint value for 

objective I (^/j) corresponding to a noninferior solution. By varying Z,‘ incrementally, 
the search migrates from one noninferior solution to an adjacent solution, eventually 
tracing the noninferior surface. A two-objective illustration is shown in Figure 1. 

At each intermediate step in which one noninferior solution is obtained, the model 
given by Equations (6)-(9) is solved. A straightforward implementation of an 
algorithm that repeats this intermediate step would be similar to iterative execution of 
a single objective EA, which is not necessarily computationally efficient. Instead, 
CMEA exploits the basic concept that for some classes of problems, adjacent 
solutions in the decision space map to adjacent points in the objective space. Its 
implication is that these decision vectors (xs) (that map to adjacent noninferior points 
in the objective space) have solution features (i.e., values of XjS) that are only 
marginally different. This enables the beneficial use of the final population 
corresponding to the current noninferior solution to seed the search of an adjacent 
noninferior solution. The new search of course would have an updated constraint set 
(8) to represent an adjacent noninferior point in the objective space. When the new 
selection pressure manifesting from the updated constraint vector is applied on the 
previous population, the population quickly migrates to an adjacent noninferior 
solution. A systematic update of the constraint set (8) thus enables an efficient 
mechanism for incrementally tracing the noninferior set. This incremental population 
migration approach significantly reduces the computational burden compared to that 
required when solving each single objective EA as independent search problems. 
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Fig. 1. Illustration of the constraint method using a two-objective example 



Using a two objective problem as an illustration (Figure 1), let the current 
constraint value for objective 1 be Zj. While at noninferior point C, the updated 
constraint value corresponding to the search for an adjacent noninferior solution D 
would be (Z j - A), where the magnitude of A determines the minimum interval desired 
between adjacent noninferior solutions. For example, smaller values of A would 
result in a finer coverage (or better distribution) of the noninferior set, but would 
require execution of more intermediate steps, each of which requiring the solution of 
a single objective EA. At the beginning of the algorithm, the solutions to the two 
extreme points (A and B in Figure 1) are obtained. Using these solutions, the 
increment A is computed as (Zj” - Zj*)/r, where r is a predefined number of intervals. 
Then the algorithm starts at extreme point B, and solves the model with the constraint 
value of (8) set to Z|”-A Once the population converges to the adjacent noninferior 
solution according to some stopping criterion, the best solution is stored. Then the 
constraint value is incremented adaptively to Z^ <— Z/-A, and the current population 
continually undergoes the evolutionary operators and fitness pressure. To introduce 
higher population diversity at the beginning of each search, the mutation operator is 
applied in an adaptive manner during each intermediate step, starting with a higher 
rate and gradually reducing it (e.g., exponential decay) with generations within each 
step. Thus, at the beginning of each intermediate step the higher mutation rate 
perturbs the converged population around the previous noninferior point, introducing 
diversity for the new search. 

This iterative process is terminated when the constraint value corresponds to the 
extreme point A, i.e., when Z,‘ = Zj*. Two convergence criteria are implemented to 
determine when to update the constraint value and when to initiate the search for the 
next noninferior solution. One of the criteria is to check if the number of generations, 
generation, exceeds a maximum value, maxGenerations. The other criterion is to 
track the improvement in the best solution in each intermediate step; convergence is 
assumed when the best solution does not improve within a certain number {N) of 
successive generations. If either of the above two criteria is satisfied then the 
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Fig. 2. Flowchart for CMEA - Constraint Method-Based Evolutionary Algorithm 



constraint value is updated. The key steps of CMEA are shown as a flowchart in 
Figure 2. 

Instead of converging the population to the noninferior set simultaneously, at each 
intermediate step of CMEA, a point in the noninferior set is identified through a 
search conducted by the whole population, and the final noninferior set is generated 
by storing all noninferior solutions found at the intermediate steps. The coverage of 
the noninferior set is achieved explicitly by traversing the noninferior surface through 
incremental and systematic updates of the constraint values. 



4 Testing and Evaluation of CMEA 

CMEA was applied to two test problems of different difficulty and characteristics. 
The first application uses Schaffer’s F2 problem (Schaffer, 1985), which is an 
unconstrained, nonlinear problem. This is included since most other MOEA methods 
have been tested against it, providing a common basis for comparison. While this 
application represents a problem in a continuous search space, the second application, 
which uses the extended 0/1 multiobjective knapsack problem (Zitzler and Thiele, 
1999), represents a problem in a combinatorial search space. This problem is a 
constrained, binary problem. Performance comparisons of several MOEAs in solving 
this problem are presented by Zitzler and Thiele (1999), and are used here to compare 
the performance of CMEA. In addition, a noninferior set was generated using a 
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mathematical programming-based weighting method for the extended 0/1 knapsack 
problem, which was solved using a binary programming solver CPLEX®. 

Several performance criteria are used to evaluate CMEA and to compare it with 
other approaches: 1) accuracy, i.e., how close are the generated noninferior solutions 
to the best available prediction; 2) coverage, i.e., how many different noninferior 
solutions are generated and how well are they distributed; and 3) spread, i.e., what is 
the maximum range of the noninferior surface covered by the generated solutions. 
Currently reported as well as newly defined quantitative measures (Chetan, 2000) are 
used in comparing CMEA with other MOEAs. The robustness of CMEA in solving 
problems with different characteristics (e.g., real vs. binary variables, constrained vs. 
unconstrained, continuous vs. combinatorial) is examined, in some limited manner, by 
applying it to these two different problems. To evaluate the robustness of CMEA in 
generating the noninferior set and providing good coverage, random trials were 
performed where the problems were solved repeatedly for different random seeds. A 
representative solution is used in the comparisons below 



4.1 Schaffer’s F2 Problem 



4.1.1 Description. The E2 problem is defined as follows: 



Minimize 


Z,=x 


( 10 ) 


Minimize 


Z, = (x-2f 


( 11 ) 



The range for the decision variable x is [-5,7]. The Pareto optimal solutions 
constitute all x values varying from 0 to 2. The solution x = 0 is optimum with respect 
to Zj while the solution x = 2 is optimum with respect to Z^. That is, objective 
functions and Z^ are in conflict in the range [0,2]. 

4.1.2 Results. The F2 problem was solved using CMEA with algorithm- specific 
parameters as shown in Table 1. Results are compared in Figure 3 where the exact 
solution (obtained analytically using Equations (10) and (11)) for this problem is also 
shown. Although this is a relatively simple problem, the results indicate that CMEA 
is very accurate in generating the noninferior set for this problem. Also, it provides 
good coverage by generating a good distribution of noninferior solutions, and 
provides a full spread. 



Table 1. CMEA parameters and settings for solving the test problems 



Problem 


Variable 

Type 


CMEA Parameters 


No. of 
intervals 


Pop. 

size 


Encoding 


Crossover 


F2 


Real 


100 


100 


32 bit Binary 


Uniform 


Knapsack 


Binary 


100 


100 


Binary 


Uniform 
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Zi 

Fig. 3. The true noninferior tradeoff curve and the noninferior set determined hy CMEA for 
Schaffer’s F2 problem 

4.2 Extended 0/1 Multiobjective Knapsack Problem 

4.2.1 Description. Zitzler and Thiele (1999) used in their work a knapsack problem 
that extends the traditional single objective knapsack problem by incorporating two 
knapsacks that can be filled by items selected from a larger collection of items. 
Similar to the traditional knapsack problem, each knapsack has a limited weight 
capacity with different payoff when each item is included in it. The goal is to allocate 
a limited set of items to maximize the payoff in each knapsack without violating its 
weight capacity constraint. This multiobjective problem is defined mathematically as 
follows: 



Maximize Z,(x) = p,j x- \/l = (12) 

;=i 

n 

Subject to ^ Wj. X. < c, \fl= (13) 

i=i 

In the formulation, Z/jc) is the total profit associated with knapsack I, p^j = profit of 
placing item j in knapsack /, w,j = weight of item j when placed in knapsack /, c, = 

capacity of knapsack I, x = (x^, x^, , xj) e {0,1 }” such that = 1 if selected and = 0 

otherwise, n is the number of available items and k is the number of knapsacks. 
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This binary MO problem was solved for the cases with two knapsacks (i.e. k = 2) 
and 250 and 500 items. The results reported here correspond to « = 500 and k = 2. The 
data for the problems solved were adapted from Zitzler and Thiele (1999). 

4.2.2 Results. The extended knapsack problem was solved by CMEA for the 
parameter setting shown in Table 1. In addition, the noninferior set was generated 
using the constraint method for this problem by modeling it as a binary linear 
programming (BLP) model. This was solved using the binary linear programming 
solver, CPLEX®. In Figure 4, these results are shown along with the results reported 
by Zitzler and Thiele (1999) for the following MOEAs: VEGA, NPGA, NSGA, and 
SPEA. To examine the consistency of CMEA in solving this problem, ten trials with 
different random seeds were conducted, and the results were insensitive to the random 
seed, indicating robust behavior. 

Accuracy of the noninferior solutions generated by CMEA should be compared 
with respect to the best available noninferior set, as well as with the best estimate 
obtained by the other MOEAs. The mathematical programming-based estimate of the 
noninferior set, the best available for this problem, is included in Figure 4 to make the 
first evaluation. Compared to this, the accuracy of noninferior solutions generated by 
CMEA and the other MOEAs is relatively poor. The combinatorial nature of the 
search can be attributed to the weak performance by all EAs. Accuracy of CMEA in 
comparison to other MOEA results , however, is very good. Noninferior solutions 
obtained by SPEA, the best performing MOEA according to Zitzler and Thiele 
(1999), appear to dominate the solutions generated by CMEA in the corresponding 
noninferior region. The spread or range covered by the CMEA generated solutions, 
however, is far superior to that attained by all other MOEAs, including SPEA. 
Further, CMEA is able to provide good coverage by identifying noninferior solutions 
that are better distributed over a wider range of the noninferior set. 



5 Performance Metrics and Comparison of MOEAs 

To compare the performance of CMEA with that of other MOEAs, the following 

quantitative measures are used. 

- Accuracy: The S factor used by Zitzler and Thiele (1999) to represent the size of 
noninferior space covered is used to characterize and compare accuracy. In 
addition, the approach used by Knowles and Come (2000) is used to characterize 
the degree to which a noninferior set outperforms another. An either-or criterion is 
used to determine if the noninferior set obtained by an MOEA dominates that 
obtained by another MOEA; the closeness of the two points of intersection are not 
differentiated statistically. 

- Spread: Spread is quantified for each objective as the fraction of the maximum 
possible range of that objective in the noninferior region covered by a noninferior 
points A and B refer to the two extreme points, i.e., the single objective optimal 
solutions for objective 1 and 2, respectively, for a two objective case. The 
maximum range covered by the noninferior solutions represented by the ordered 
set C = {Cj, V /j£ {0,1, ..., q}} is {Z^‘‘ - Z^‘) and (Z/^ - Z/'') in Z, and Z^ objective 
space, respectively. Therefore, the spread metrics in objective space 1 and 2 are 
defined as {Z^‘‘ - Z/^)/(Z/ - Z/) and {Z^‘ - Zj'^'')/(Z/ - Z/), respectively. 
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Fig. 4. A comparison of noninferior sets obtained using CMEA, VEGA, NPGA, NSGA, SPEA, 
and mathematical programming approach (BLP-Binary Linear Programming) for the extended 
0/1 multiobjective knapsack problem 



Coverage: A quantitative measure computed based on the maximum gap in coverage 
is defined to represent the distribution of the noninferior solutions generated hy an 
MOEA. The Euclidean distance between adjacent noninferior points in the objective 
space is used to indicate the gap. A smaller value of this metric indicates better 
distribution of solutions in the noninferior set. This metric is defined separately as VI 
and V2 to characterize the coverage within the range of noninferior region defined by 
1) the extreme points, and 2) the solutions generated hy the MOEA, respectively. 
Using the illustrations shown in Eigure 6 , VI is defined as Max {d^, \/ hs {0,1, ..., 
q}}, and V2 is defined as Max {d^, V /ze { 1, 2, ..., < 7 -! } }. 

A summary of these metrics are compared in Tables 2a-2d for the noninferior 
solutions generated by all MOEAs shown in Eigure 4. These results indicate that 
overall CMEA performs better than NPGA, NSGA, SPEA, and VEGA with respect to 
finding nondominated solutions with a good distribution in the noninferior region. 
This conclusion is specific to the 0/1 extended multiohjective knapsack problem, and 
similar performance comparisons for other problems are needed to make more general 
conclusions. Although CMEA provides the best distribution of solutions in the entire 
noninferior range (based on VI metric), SPEA provides a better distribution (based on 
V2 metric) within the narrower noninferior range represented by its solutions. 
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Fig. 5. An example two-objective noninferior tradeoff to illustrate the computation of: 1) 
Spread metric, and 2) Coverage metric 



Table 2a. Accuracy comparison, based on the S factor (Zitzler and Thiele, 1999), of noninferior 
sets generated by different MOEAs for the extended 0/1 multiobjective knapsack problem. A 
larger value indicates better performance; the best is shown in bold 



MOEA Method 


S Factor 


SPEA 


0.89 


NSGA 


0.79 


NPGA 


0.83 


VEGA 


0.81 


CMEA 


0.92 



6 Summary and Conclusions 

This paper presents a new MOEA, CMEA-Constrain Method-based Evolutionary 
Algorithm for solving multiobjective optimization problems. CMEA is founded upon 
two simple, but powerful concepts borrowed from the mathematical programming 
literature: 1) optimization of a single objective model with target constraints on all but 
one objective finds a noninferior, or Pareto optimal, solution; and 2) for some classes 
of problems, noninferior solutions adjacent in objective space map to adjacent 
decision vectors with only marginal differences in the decision space. The attractive 
features of CMEA include: easily adaptable for use with existing implementation of 
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evolutionary algorithms for an optimization problem since no new operators are 
needed; and relatively less compute intensive since Pareto optimality is ensured in an 
implicit manner, and therefore expensive sorting and pair-wise comparison operations 
that are typically required by other Pareto-based MOEAs are eliminated. 

Table 2b. Accuracy comparison, based on the metric defined by Knowles and Come (2000), of 
CMEA with different MOEAs for the extended 0/1 multiobjective knapsack problem 



The MOEAs 
Compared 
(MOEAj vs. 
MOEA,) 


(Pj, P^): (Percentage number of times MOEAj 
outperforms MOEA^, Percentage number of times 
MOEAj outperforms MOEA,) 


Number of Sampling Lines 


108 


507 


1083 


(CMEA vs. SPEA) 


(95.4, 4.6) 


(95.3, 4.7) 


(95.2, 4.8) 


(CMEA vs. NSGA) 


(100, 0) 


(100, 0) 


(100, 0) 


(CMEA vs. NPGA) 


(100, 0) 


(100, 0) 


(100, 0) 


(CMEA vs. VEGA) 


(100, 0) 


(100, 0) 


(100, 0) 



Table 2c. Comparison of Spread of noninferior sets generated by different MOEAs for the 
extended 0/1 multiobjective knapsack problem. A larger value indicates better performance; 
the best is shown in bold 



MOEA 


Spread Metric 


in Zj objective space 


in Zj objective space 


SPEA 


0.28 


0.23 


NPGA 


0.24 


0.25 


NSGA 


0.26 


0.15 


VEGA 


0.20 


0.16 


CMEA 


0.78 


0.47 



Table 2d. Comparison of Coverage of noninferior sets generated by different MOEAs for the 
extended 0/1 multiobjective knapsack problem. A smaller value indicates better performance; 
the best is shown in bold 



MOEA 


Coverage Metric 




VI (includes tbe extreme 


V2 (excludes the extreme 




points for each objective) 


points for each objective) 


SPEA 


0.118 


0.011 


NPGA 


0.122 


0.016 


NSGA 


0.121 


0.021 


VEGA 


0.130 


0.015 


CMEA 


0.088 


0.04 



To evaluate the applicability of CMEA to MO problems, it was applied to two test 
problems with different characteristics and levels of difficulty. Test problems 







Constraint Method-Based Evolutionary Algorithm (CMEA) 311 



covered continuous as well as combinatorial search, unconstrained as well as 
constrained optimization, real as well as binary variables, and as few as one variable 
to as high as 500 variables. This evaluation included performance comparisons with 
other MOEAs and with mathematical programming-based noninferior solutions. 
Accuracy, coverage, and spread of the noninferior solutions were used to compare the 
performance. To evaluate the consistency of CMEA in generating the noninferior 
set, several random trials were performed. Overall, CMEA performed well with 
respect to these criteria for both problems tested. The spread and coverage of 
noninferior solutions obtained using CMEA were always better than those 
demonstrated by other MOEAs. With respect to accuracy, SPEA did better than 
CMEA for a narrow range of noninferior solutions, but overall CMEA did better over 
a broader range of the noninferior set. 

Some known limitations of CMEA include the following. The computational 
efficiency gain obtained in CMEA is predicated on the existence of similarities in 
noninferior solutions that correspond to adjacent points in the objective space. Eor 
problems where this may not hold true strongly, the search implemented by CMEA 
becomes analogous to solving a number of independent single objective optimization 
problems, and therefore, may not realize any significant computational gain. For a 
problem with more than two objectives, incrementally updating the constraint values 
to obtain an adjacent point is not necessarily as straightforward as is for the two- 
objective cases presented here. More investigation is needed to evaluate this issue 
when applying CMEA to higher dimensional problems. 

In the present study, the number of functions evaluations was used as a measure to 
compare the computational needs of CMEA with that of a single objective-based MO 
analysis without seeding. CMEA required approximately 75% fewer function 
evaluations. This measure alone is not sufficient to compare the computational gain, if 
any, that may be realized by CMEA over the other MOEAs that use explicit Pareto 
optimality ordering. The computational performance of CMEA and other MOEAs 
needs to be studied further. 
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Abstract. We consider the problem of exploration of the set of all global 
optima (Pareto-points) or an approximation thereof in the context of 
multi-objective function optimization. Up to now, set oriented techniques 
assume that the evaluation of the m— dimensional vector of objectives can 
be done exactly which is important to steer the search process towards 
global optima. Here, we extend such techniques to allow objectives to 
be uncertain, i.e., vary within intervals. This may be often the case if 
the exact computation of objectives is computationally too expensive 
such that only estimates on the objective values of a design point may 
be derived. For objective values that are constrained by intervals, we 
derive a theory of probabilistic dominance, an extension of the definition 
of Pareto-dominance. Also, we show how this theory may be used in 
order to guide the selection process to approximate the Pareto-set. 



1 Introduction 

The need for design space exploration on different levels of abstraction has re- 
ceived wide attention in many different domains today, e.g., in function mini- 
mization, during synthesis of electronic systems, etc. The optimization problem 
is typically multi objective in nature and m design objectives have to be consid- 
ered simultaneously. Also, not one global optimum is sought in many cases, but 
a set of so-called Pareto-points. In order to explore the front of Pareto-optimal 
points, set-oriented optimization techniques have been investigated recently, see, 
e.g., 12] for continuous problems, and many population-based techniques based 
on evolutionary algorithms, see e.g., 0’ CH]; that are applicable also to discrete 
problems. In both cases, the success of exploration depends heavily on the pre- 
ciseness of the computed objective functions because these values are used for 
defining the Pareto-dominance criterion. 

In reality, however, the objective function is often only an approximation 
of the real objective. Such approximations are either necessary because of un- 
known model parameters, computational cost, or, the objectives are based on 
estimations because there is no analytical way to compute the costs exactly. 

* This work was supported by the Deutsche Forschungsgemeinschaft (DFG) under 
grant Te 163/5 
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In such cases, it is questionable whether a design point dominates another 
point or not. It turns out, however, that in many cases, the error in estimating 
an objective may be bounded from below and above such that the objective may 
be constrained by an interval. Such abstractions are often met, e.g., in evaluating 
the cost of the design of a technical system or in the latency of a schedule. 

In this paper, we extend existing techniques for design space exploration 
based on the Pareto-dominance criterion to the case where one or more objectives 
of an m— dimensional vector function are subject to uncertainties and given by 
so called property intervals. 

By treating an objective y = f(x) on an n— dimensional design point x as 
a random variable Y , we consider the case of eontinuous uniform distribution 
of F G [/*(3^)j • • • , with /®(a;), /“(a:) G R. In this case, we assume that 

the probability density p{y) is equal and constant over the property interval 

In this paper, we derive formulas for the probability of a design point with 
a single-objective (rn = 1) for continuous distributions and extend this result 
to the multi objective case (m > 1). This leads to a probabilistic dominance 
criterion for Pareto-points. The case of discrete distributions will be described 
in a forthcoming paper. 

We present an evolutionary multi objective algorithm based on an exten- 
sion of the SPEA algorithm reported by Zitzler and Thiele m This algorithm 
is called ESPEA - Estimate Strength Pareto Evolutionary Algorithm. A techni- 
cal case study introducing a hardware/software partitioning problem is used to 
provide first results using ESPEA. 

2 Definitions 

2.1 Multiobjective Optimization 

First, we give the basic notation for multiobjective optimization problems. 

Definition 1 (Multiobjective Optimization Problem (MOP)). A MOP 

includes a set ofn parameters (decision variables), a set ofm objective functions, 
and a set of k constraints. Objective functions and constraints are functions of 
the decision variables. The optimization goal is to 

minimize y = f{x) = (fi{x), f 2 {x), ■■■ , fm{x)) 
subj. to e(x) = (ei(x),e 2 (x), • • • , ek{x)) < 0 

where x = {x\,X 2 , ■ ■ ■ , Xn) G X and y = (yi, 2/2> • ’ ’ > Vm,) G Y and x is called the 
decision vector, y is the objective vector, X is denoted as the decision space and 
Y is called the objective space. 

The constraints e(x) < 0 determine the set of feasible solutions. 

Definition 2 (Feasible set). The feasible set Xy zs defined as the set of deci- 
sion vectors x that satisfy the constraints e(x): 

X/ = {a; G X I e{x) < 0} 



( 1 ) 
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The image ofKf, i.e., the feasible region in the objective space, is denoted as 

Y/ = /(X;) = U.ex, {/(*)}• 

With this definition, we are able to define Pareto-dominance for any two 
decision vectors a and b as follows: 

Definition 3 (Pareto-dominance). For any two decision vectors a and b, 

a>~ b {a dominates b) iff /(a) < f{b) 
a ^ b (a weakly dominates b) iff f{a) < fib) 

a ~ b (a is indifferent to b) iff f{a) ^ /(b) A /(b) ^ f{a) 

In this definition, the relations =, < and < on objective vectors are defined as 
follows: 

Definition 4. For any two objective vectors u and v, 
tt = t) iff Vz = 1, • • • , m : 

tt < t) iff Vz = 1, • • • , m : Mi < (2) 

u < V iS u < V A u ^ V 
The relations > and > are defined analogously. 

Definition 5 (Pareto-optimality). A decision vector x G ~X.f is said to be 
non-dominated regarding a set A C X/ iff 

,3a G A : a y X 

If it is clear from the context which set A is meant, is will be simply omitted in 
the following. Moreover, x is said to be Pareto-optimal iff x is non-dominated 
regarding X f . 

The entirety of all Pareto-optimal points is called the Pareto-optimal set, the 
corresponding objective vectors form the Pareto-optimal front or surface. 

Definition 6 (Non-dominated sets and fronts). Let A C Xy. The function 
g{A) gives the set of non-dominated decision vectors in A: 

g(A) = {aeA|ais nondominated regarding A} (3) 

The set g{A) is the non-dominated set regarding A, the corresponding set of 
objective vectors f{g{A)) is the non-dominated front regarding A. Furthermore, 
the set Xg = g(X.f) is called Pareto-optimal set and the set Yg = /(Xg) is 
denoted as the Pareto-optimal front. 
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2.2 Random Objectives and Property Intervals 

In the following, we define the notion of probabilistic dominance of decision vec- 
tors for which each objective is not a number, but a random variable with values 
bounded by an interval called property interval. 

First, we assume that our optimization problem has a single objective (m = 
1), hence y = f(x) for a given n— dimensional decision vector. Later, we extend 
our notation to the general case. 

Definition 7 (Uncertain objective; property interval). Given an n— di- 
mensional decision vector and a one- dimensional objective function f . We as- 
sume f : R" t—^W.fis called uncertain, if for each a; S X, 

y = f{x)G[f^{x),---,P(x)] (4) 

where [/'’(a:), • • • , /“(a;)] with /®(a;), /“(a;) G R, and f'’{x) < /“(a;) is called 
property interval of f at a: .Q 

Example 1. In Fig. [fl there are three decision vectors a, 6, and c that have 
different property intervals. Let without loss of generality the objective function 
to be minimized, then obviously a dominates c because no matter what value 
the objective of a assumes in the shown interval, the value is lower than any 
objective value in the interval of c. The situation is different for the pair a and 
b: In case f{a) = 2 and f{b) = 8, a dominates b. However, in case f{a) = 10, 
then no matter what value f{b) G [8,9] may take, b dominates a. Hence, the 
dominance is uncertain. 



H 



123456789 101112131415 

Fig. 1. Example of three decision vectors with corresponding property intervals. 

From the previous example, it becomes clear that in order to make decisions 
concerning the dominance of decision vectors with uncertain objective functions 
that are given by property intervals, we need a notion of probabilistic dominance. 

In the following, we treat f{x) for each a; G X as a distinct random variable 
Y and consider continuous uniform distributions of U G [f'^{x), • • • , /“(x)]. 

3 Continuous Uniform Distribution of Objective Values 

In this case, we assume that the probability density function is constant over 
the property interval of each random variable Y = f{x), x G X. 



^ W.l.o.g., we assume /“(x) = /“(x) -|- e with e G in the following. 
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Definition 8 (Probability density; uniform distribution). Let [y®, • • • , y“] 

denote the property interval of the random variable Y = f{x). For a uniform 
distribution, its probability density function is given by 

( 0 if y < y® 

p{y) = { if y" < y < (5) 

[ 0 if y > y“ 

From the probability density function p(y), we can derive the cumulative 
distribution function (cdf), or just distribution function, F(y), as F{y) = P[Y < 
y] for all y € R. With F{y) = J^^p{y)dy, we obtain the probability of [a < y < 

b] as P[a <Y < b] = F{b) — F{a) = jl^p{y)dy. 

Example 2. Consider again the property intervals of the decision vectors a, b, 
and c in Fig.Q For uniformly distributed objective values, the probability density 
function and the distribution function of the corresponding random variables 
A = f{a), B = f{b) and C = f{c) are shown in Fig. El together with the 
probability distribution functions P[Y < y]. 




Fig. 2. Probability density and probability distribution functions (uniform distribu- 
tion). 



3.1 Probability of Dominance: Single-Objective Case 

In the following, we elaborate the probability of a design point a with objective 
value a G [a®,---,o“] to dominate a design point b with objective value b G 
[&®, • • • , 5“], again first for a single objective. 

Theorem 1 (Probability of (weak) Dominance; uniform distribution). 

Given two design points a and b with objective a G [a®,---,a“] and b G 
[&®,-'-,5“], respectively. The probability of a to dominate b, written P[a Y b] 
for uniform distribution functions is given as: 

if 6“ < a® 
if a“ < 6® (0) 

^fj^dy'j if else 



P[a Y b] = 



u 



0 
1 

7 . -i 

s ay j r s L 1 1 

—a^ ^ Jy=max|a'^ ,0® I 
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Proof. The first and the second case in Eq. (0 are obvious and correspond to 
the case when both property intervals do not overlap. The last case is a little 
bit more complicated and corresponds to the case when both intervals overlap. 
The probability that random variable A corresponding to the objective of a is 
smaller than or equal to B is given as 

P\A<B]=r ^—.p[B>y]dy 

With P[B > y] being equal to 1 for values of ?/ < 6® and being equal to 1 — P[i3 < 
y] ^ 1 — P[B < y] = 1 — for values of y in the range from b“ to we 

obtain the above results. 

Example 3. Consider the two decision vectors a and b introduced in Fig. [D We 
want to compute the probability P\a > b] which is equal to P\A < B] for the 
corresponding random variables of the related property intervals. Evaluating 
Eq.l0) with a® = l,a“ = 11, and bs = 8,6“ = 9, we obtain • (6® — a®) = ^ 

for the first term and (6“ — y)dy for the second term resulting 

in 9(9 ~s)~(i/ 2(9 -8 )) _ giving a total of 0.75. Hence, the probability of a 
(weakly) dominating b is 0.75. 



Expected value 

Definition 9. Let [y®, ■ • • ,y“] denote the property interval of the random vari- 
able Y = f{x). For a uniform distribution, its expected value is given by 



/ oo 

yp{y)dy= / y 

-OO ^ 



ya _ yS 



:dy 



( 7 ) 



(( y “)2 _ ^ y" + y'" 



Example 4- Consider again the three property intervals of decision vectors a, b 
and c as shown in Fig. ^ Let the corresponding random variables be A, B, and 
C. For an assumed uniform distribution, the expected values are computed using 
Eq.Q) as E[A] = 2(a^_aq ((^“)^ “ («")^) = 2(11-1) ((77)^ “ (^)^) = 6- Similarly, 
we obtain E[B] = 2(9-8) ((9)^~(8)^) = 8.5 and E[C] = 2(15-12) ((f^)^ ~ (f^)^) = 
13.5. These values are exactly in the middle of the corresponding intervals. 



3.2 Probability of Dominance: Multi-objective Case 

Up to now, we have seen how to compute probabilities of dominance in case of 
single-objective functions. In Fig. 0 it is shown that the question of dominance 
becomes even more elaborated in case of multiple objectives. We extend the 
notion of (weak) Pareto-dominance0 

^ To be exact, we must integrate only up to 6“ — e where e — > 0, positive, in the above 
equation if 6“ determines the upper bound of the integration interval. 

^ We consider the notion of weak dominance here and throughout the rest of the paper. 
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Fig. 3. Pareto-fronts with 2D property intervals. Pareto-points surrounded by dark 
rectangles. 



Theorem 2 (Probability of (weak) Pareto-dominance). For any two 

n— dimensional decision vectors a and b, and m statistically independent ob- 
jective functions /i, / 2 , • ' ' , fm, 



P[ahb] = l[P[Ma)<Mb)] (8) 

Proof. We assume that the random variables associated to each objective 
and each decision vector are mutually independent. Hence, the conjunction of 
the Pareto-conditions leads to the product of the corresponding probabilities. 

Before we will make use of this probability calculus of dominance in the 
context of steering design space exploration, we give a few more examples. 

Example 5. Consider two n = 2 dimensional decision vectors a and b with 
two objectives fi and / 2 . Let the corresponding property intervals be given 
by /i(a) e [2,4], / 2 (a) e [2,4], and fi{b) e [2,3], and / 2 (b) e [2,3]. We 
obtain P[a P b] = P[fi{a) < fi{b)] ■ P[f 2 {a) < / 2 (b))] where P[fi{a) < 
/i(b)] = l/2(3(3 - 2) - 1 / 2(32 - 22 ) = 0.25, and also P[f 2 (a) < / 2 (b)] 
= l/2(3(3 - 2) - 1 / 2(32 - 22)) = 0.25. Hence, P[a P b] = 0.0625. 

Similarly, we obtain P\b ^ a] = P[/i(b) < fi{a)] ■ P[/ 2 (b) < / 2 (a))] where 
P[fi{b) < fi{a)] equals P[/ 2 (b) < / 2 (a)] = 2(3 - 2 ) - 1 / 4(32 - 22) = 0.75. 
Hence, P[b P a] = 0.5625. 
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The general case of not necessarily independent objective functions is dealt 
with in the following theorem: 

Theorem 3 (Probability of (weak) Pareto-dominance). For any two 

n— dimensional decision vectors a and b, and m objective functions 
/i,/2, ■■ ■ ,fm, letpa{X) = pa(Xi,X 2 , ■ ■ ■ , Xm) and Pb{Y) = Pb(Yi,Y 2 , ■ ■ ■ ,Ym) 
denote the probability density functions of a and b, respectively in the objective 
space with random variable Xi denoting the random variable for the ith objective 
fi{a) (similarly Yi for fi{b)). Let the property intervals for the ith objective of 
a be given by [a®, • • • , o“] (similar [6®, • • • , bf] for b). Then 



P[aYb\ = P[ai < 5 i, 02 < 62, • ’ ' , Om < &m] 

= / •••/ Pa{X)-{l- 



' Xi—af 



IXm^a% 



' Y\ =max{a^ ,6® } 



/ y^=max{a^ } 



(9) 

Pb{Y) dY)dX 



Proof. In analogy to Theorem 0 P[a Y b] is obtained as ' ' ' fx"'=a= 

Pa(X) ■ P[X < b]dX. With P[X < b] = 1 - P[X > b] « 1 - P[X > b],l0 we 
obtain the above result. 

It can be seen that in case Pa(^) = Pa(-^i)’Pa(-^ 2 ) •Pa(f'fm) and similarly 
Pb{Y) = Pb{Yi) ■ Pb{Y 2 ) ■ ■ ■ ■ Pb{Ym) hold, then the result in Theorem Elmay be 
directly obtained as a special case. 



Expected value. The definition of the expected value for the single objective 
case in Definition El directly extends to the multi-objective case as follows: 

Definition 10. For any two n— dimensional decision vectors y and m objective 
functions /i, / 2 , ■ ' ’ i fm, let Y = (Yi, ¥ 2 , • ■ ■ , Wi) denote the vector of random 
variables. 

E[Y] = {E[Y,],E[Y2l---,E[Yj^) (10) 

4 Evolutionary Algorithm-Based Exploration with 
Property Intervals 

In this chapter, we review basic ideas for design space exploration using evo- 
lutionary algorithms. For the so-called SPEA (Strength-Pareto- Approach) in- 
troduced by Zitzler and Thiele in and applied to many important areas of 
multi-objective optimization problems in the context of embedded system design 
such as system-level synthesis P and code scheduling problems for DSPs (digi- 
tal signal processors) |l If 1 2j . we provide extensions to the selection mechanisms 
of this population-based approach that considers objective values with property 
intervals. 

^ To be exact, we must again integrate only up to 6“ — e where e ^ 0, positive, for 
all those i = 1, ■ • ■ , m in Eq. & for which 6“ determines the upper bound of the 
integration interval. 
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4.1 Basic Principles of Evolutionary Algorithms 

Evolutionary Algorithms (EA) denote a class of optimization methods that are 
characterized by a) a set of solution candidates {individuals) called population 
maintained during the optimization process that undergo b) a selection process 
and c) that are manipulated by genetic operators, usually recombination and 
mutation. Each individual represents a possible solution, i.e., a decision vec- 
tor, to the problem under consideration. However, it is not a decision vector 
but rather encodes it using a particular data structure @ The set of all possible 
vectors constitutes the individual space I. A population is thus a (multi-)set of 
vectors i G I. The quality of an individual with respect to the optimization 
problem is represented by a scalar value, called fitness. Since the quality of a 
solution is related to the objective functions and constraints, the solution must 
be decoded first by a mapping function m to derive a decision vector x = m{i) 
from i. Applying f to x yields the corresponding objective vector on the basis 
of which a fitness value is assigned to i. Natural evolution is simulated by an 
iterative process: In the selection process, low-quality individuals are removed 
from the population, while high-quality individuals are reproduced. The goal of 
this operator is to increase the average quality within the population. Recombi- 
nation and mutation aim at generating new solutions within the search space by 
variation of existing ones. The population at iteration t (also called generation 
t) is denoted Mj in the following. The symbol -I- stands for multi-set union in 
conjunction with populations. 



4.2 The Strength Pareto-Approach (SPEA) 

In the following, we review the Strength Pareto-Approach (SPEA) as introduced 
by Zitzler and Thiele m that serves as the basis of our extension called ES- 
PEA. SPEA uses Pareto-Dominance based selection ^ instead of switching the 
objective during optimization 0 or using an aggregation function, e.g., weighted 
sum approach where the coefficients are changed during optimization jS]. Also, 
it uses elitism where the policy is to always include the B best individuals of Mj 
into M(+i in order not to lose them during exploration, as suggested by De Jong 
0. Finally, in order to maintain a high diversity within the population, it uses 
a special clustering technique shown in Fig. El that is different from niching m 
and crowding |B| so that altogether, SPEA has shown to provide superior results 
compared to existing approaches, see, e.g., HSl for many problems of interest. 

As it is our main goal to extend this algorithm to be able to deal with 
objectives bounded by property intervals, we shortly reproduce the algorithm 
here in Fig. 0 



® Without loss of generality, we assume that this structure is a vector, e.g., a bit vector 
or a real-valued vector, although other structures like trees might be considered as 
well. 
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SPEA 

IN: N : population size; 

N: maximum size of external set; 

U : maximum number of generations; 

Pc', crossover probability; 

Pm', mutation rate; 

OUT : A: non-dominated set; 

BEGIN 

Step 1: Initialization: 

Set Mo = Mo = 0 and t = 0. 

FOR i = 1 TO A DO 

Choose i according to some probability distribution; 

Mo Mo + {i}; 

Step 2: Update of external set: 

Set M' = Mt; 

M' = M' + {i \ i G Mt A m{i) G g(m(Mt))}; 

Remove individuals from M' w/hose corresponding decision vectors 
are weakly dominated regarding m(M'); 

Reduce the number of individuals externally stored by means of 
clustering, i.e., call Alg. Cluster with parameters M' and N, 
and assign the resulting reduced set to M(+i 
Step 3: Fitness assignment: 

Calculate the fitness of individuals in Mt and Mt using Alg. Fitness; 

Step 4: Selection: Set M' = 0 ; 

FOR f = 1 TO A DO 

Select two individuals i,j G Mt + Mt at random; 

IF F{i) < F{i) 

M' := M' + {f}; 

ELSE 

M' := M' + {j}; 

Step 5: Recombination: ... 

Step 6: Mutation: ... 

Step 7: Termination: Set Mt+i = M'" and f = f + 1; 

IF f > [/ 

A = ff(m(Mt)); 

ELSE GOTO Step2. 

END 



Fig. 4. Zitzler’s Strength Pareto EA (SPEA). 



4.3 ESPEA 

We call our extension of SPEA to deal with estimated objective values bounded 
by intervals ESPEA - Estimate Strength Pareto Evolutionary Algorithm. 
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First, two variants to deal with property intervals seem to be appropriate: 

— Expected value computation: A straightforward method to deal with objec- 
tives with uncertain values is to let SPEA unmodified and work directly with 
the vectors of expected values E\Y] of the objective values. The advantage 
of this approach is that no changes have to be made in the code of SPEA 
but only code to calculate the expected values for each decision vector has to 
be added based on the knowledge of the property intervals and probability 
density functions used. As this first variant is obvious, we concentrate here 
on the second approach. 

— ESPEA: Instead of working with expected values, we modify SPEA accord- 
ingly. Within SPEA, changes are needed in Step 2 (update of the external 
set, see Fig. ^ and in the two algorithms Cluster (see FigEI) and Fitness 
(see FigEI . These changes are described next. 

Update of external set. We describe changes to SPEA to deal with random 
variables caused by property intervals. First, we consider step 2 of SPEA. Here, 
the external set is updated to include those members of the actual population 
M( that are non-dominated with respect to Mj. With probability intervals, we 
are only able to state the probability of a vector m{i) to be dominated or not. 
Given P\m{j) > m(i)]. We define 

H h m{i)] 

Each of the following three rules seems to be appropriate to decide whether 
a vector m{i) should be taken over into the external set: 

— Add m{i) to M' if R{i) < a; this condition adds all vectors which are smaller 
in probability to be dominated then a, e.g., let a = 0.25. Or, 

— add those (3 % of vectors i with smallest R{i). Or, 

— those 7 G N, 7 < individuals with smallest R{i) are copied into the 
external set. 

After updating the external set in either of the above ways, weakly dominated 
individuals i must be removed regarding m(M'). Again, we have to deal with 
probabilities. We propose the following similar strategy: 

Let 

^ ^ P[Tn{i) h m{j)] 

Then eliminate 

— those j from M' where R{j) > a' (e.g., a' = 0.9), or 

— those P'% vectors j with largest R{j), or 

— those 7 ' vectors j with largest R{j). 

The last step concerns the clustering algorithm called in Step 2 of SPEA, see 

Fig. El 
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Fitness 

IN: Mt : population; 

Mt: external set; 

OUT : F\ fitness; 

BEGIN 

Step 1: Strength and fitness assignment external set: 

FOR EACH f £ M; DO 

_ |{3 I j6Mt A , 

F(i) = S(i); 

Step 2: Fitness assignment population: 

FOR EACH 7 £ Mt DO 

^ ^ SigMt 

END 



Fig. 5. Fitness assignment in SPEA. 



Cluster 

IN: : external set; 

N: maximum size of external set; 

OUT : Mt+i: updated external set; 

BEGIN 

Step 1: Initialize each i £ M' as a cluster: 

Step 2: |F ICI < N , goto Step 5, else goto Step 3; 

Step 3: Distance calculation (objective space) of all possible pairs of clusters: 

|ci|-|c2| ^ilCci,i2ec2 

Step 4: Amalgate two clusters with minimal distance: 

C ~ C \ {ci, C 2 } U {ci U C 2 }; 

Step 5: Select a representative for each cluster by computing the centroid 
(point with minimal average distance to all other points in the cluster) 

M' := Ucecc; 

END 



Fig. 6. Clustering in SPEA. 



Clustering algorithm. Depending on the above update of the external set, 
different numbers of clustering steps might be necessary in order to reduce the 
update set to its maximal size N . 
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The only point where the objective values of two individuals are needed is 
for the determination of the distance between two vectors i and j d(i,j) in the 
objective space. Here, we approximate the distance by the expected value of the 
distance: 

E[d{i,j)] = E[\\f{m{i))-f{m{j))\\] (11) 



Fitness assignment algorithm. Finally, the fitness assignment algorithm Fit- 
ness called in Step 3 of SPEA (see Fig. 0) must be changed. We consider Step 
1 first. Here, the strength S{i) = Kj I computed. In this 

formula, the size of the set {j \ j G Mj A m(i) A m{j)} is a random variable 
the expected value of which is computed as follows: 



E[S{i)] 



I j e Mt A m{i) A m(j)}|) 
^ rn(j)] 

jeMt 



( 12 ) 



Similarly, in Step 2 of the fitness assignment algorithm, we approximate F{j) = 
F(j)«l+ ^ E[S{i)] (13) 

ieMt :P[m(i)^m(j)]>a 



The remaining Steps of SPEA (1,4, 5, 6, 7) remain unaffected. 



5 Experiments 

In this section, we describe our experiments using the example of Pareto-Front 
exploration for a hardware/software partitioning problem: A task graph is to be 
mapped onto programmable hardware and software components such that the 
objectives of period (execution time) and cost should be minimized simultane- 
ously, see P for details. The objectives evaluated are the cost of the allocated 
components and the execution time (period) for the mapping (see Fig. Elfor one 
example population) . In the following, we run a benchmark 5 runs of 200 gener- 
ations each for four different variations concerning the interval sizes called exl, 
ex2, ex3, and ex4 in the following for a population size of 40 and 100 individuals 
each, exl has the smallest intervals for each parameter, ex2 has larger intervals, 
and ex4 has the largest intervals for cost and execution time of a number of 
tasks. For a = 0.25, we evaluated two different different values of a', namely 
A1 a' = 0.5 and A2 a' = 0.9. In order to compare the performance of these 
two algorithms for the 4 examples exl, ex2, ex3, and ex4, we use the following 
definition: 
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Definition 11. Let A and B be two sets of decision vectors. The function C 
maps the ordered pair (A,B) to the interval [0,1]^ 



\{b€ B-,3a€ A-.a> b}| 



(14) 



The value C{A,B) gives the fraction of B that is covered by members of A. 
Note that both C{A,B) and C{B,A) have to be taken into account, since not 
necessarily C(A, B) = 1—C{B, A). On each example, both algorithms ran in pairs 
on the same initial population; then the expected values of the Pareto-points in 
the final population were compared for each run. The minimal, maximal, and 
average values are reported in Tables ^ and 0 for two different population sizes 
{N = 40 and N = 100). 



Table 1. Comparison of algorithms A1 and A2 on four examples for population size 



Test case 


C(A1,A2) 


C(A2,A1) 




mean 


min 


max 


mean 


min 


max 


exl 


100% 


100% 


100% 


90% 


83% 


100% 


ex2 


50.8% 


25% 


75% 


94% 


85% 


100% 


cx3 


64.3% 


28.5% 


100% 


“95% 


85% 


Too% 


ex4 


83.4% 


50% 


100% 


“9^ 


85% 


Too% 



Table 2. Comparison of algorithms A1 and A2 on four examples for population size 



Test case 


C(A1,A2) 


C(A2,A1) 




mean 


min 


max 


mean 


min 


max 


exl 


80% 


0% 


100% 


96.6% 


83% 


100% 


ex2 


73.4% 


50% 


89% 


91% 


80% 


95% 


ex3 


76.2% 


67% 


100% 


98% 


95% 


100% 


ex4 


100% 


100% 


100% 


83% 


60% 


95% 



First, we discovered that for each test run and test example, the run-time 
of algorithm A1 is about 3-5 times higher than with algorithm A2 for equal 
population size and equal number of populations for all test examples except 
exl (no intervals) where both algorithms had equal run-time. The reason is that 
in case of Al, the probability level for the selection of a design point as a point 
of the external set is much lower than in case of A2. We discovered also that the 
run-time increases about quadratically with the number of points stored in the 
external set. In algorithm Al, the external set almost constantly contained N 
points (often with many multiple copies). In case of A2, the probability level was 
much higher to accept a point as a Pareto-point. Here, the external set contained 
a considerably less number of points for all our test cases. 

Concerning the quality of discovered Pareto-points, we see that no matter 
how large the intervals were, algorithm A2 shows a mean coverage of points 
explored by Al by more than 90 %. This shows that although Al has a much 
smaller acceptance level and a larger set of non-dominated points, it performed 
worse. Other experiments such as testing other and larger examples are necessary 
to support this result, see, e.g., CDI- 

We would finally like to mention that the concept of probabilistic dominance 
as introduced here might be used also in other Pareto-set exploring algorithms. 
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In order not to reject possible candidates from the Pareto-set, one should not 
just take the expected values of the objectives but decide on the probability 
of dominance if a point should be considered Pareto-point or not. For decision 
making in case of probabilistic dominance, any of the proposed styles how to 
update the external set may be used. 
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Abstract. Real engineering optimisation problems are often subject to 
parameters whose values are uncertain or have noisy objective functions. 
Techniques such as adding small amounts of noise in order to identify ro- 
bust solutions are also used. The process used in evolutionary algorithms 
to decide which solutions are better than others do not account for these 
uncertainties and rely on the inherent robustness of the evolutionary 
approach in order to find solutions. 

In this paper, the ranking process needed to provide probabilities of selec- 
tion is re-formulated to begin to account for the uncertainties and noise 
present in the system being optimised. Both single and multi-objective 
systems are considered for rank-based evolutionary algorithms. 

The technique is shown to be effective in reducing the disturbances to 
the evolutionary algorithm caused by noise in the objective function, 
and provides a simple mathematical basis for describing the ranking and 
selection process of multi-objective and uncertain data. 



1 Introduction 

The use of evolutionary algorithms (EA’s) in engineering is now well established 
and widespread. As the use of the algorithms migrates deeper into industry, 
and with more processing power available, the scale and characteristics of the 
problems being solved are changing. The objective functions are becoming more 
complex, nonlinear and often uncertain. Many model coefficients are derived by 
experiment and are therefore subject to experimental errors. In real systems, the 
true coefficients will not be the same as measured and are often time dependent 
or correlated with platform motion etc. 

These errors in the modelling are unavoidable and inevitably propagate into 
the outputs of the objective functions, the results of which are used to classify the 
quality of the individual solutions to the problem. All optimisation algorithms 
attempt to find the problem solution that gives the most favourable output from 
the objective functions. With complex systems, evolutionary algorithms are a 
useful tool in that they can tolerate highly nonlinear and noisy system models 
and objective functions and still provide reasonable suggested solutions Q. 
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This robustness to errors has also been exploited by artificially adding noise 
to the objectives in an attempt to identify solutions that are robust to noise and 
uncertainty in the real system EI5E1 . Noise is also often present when trying 
to optimise hardware systems such as in robotics. Noise or uncertainty in the 
objectives tend to slow evolution and reduce solution quality. 

Attempts to reduce noise by repeating objective calculations and then av- 
eraging or combining results have been tried 0, but often with many realistic 
problems, the time to re-evaluate is prohibitive and often the number of sam- 
ples used to average must be very small and therefore subject to considerable 
error. Most evolutionary algorithms to date have accepted these problems as the 
robustness of the algorithms allows small errors to be tolerated. 

Therefore we may form two categories of problem: 

1. Noisy: Two successive evaluations of the same chromosome information 
return two different sets of objectives. 

2. Uncertain: Two successive evaluations of the same chromosome return the 
same objective values, but when comparing two different chromosomes, er- 
rors in the modelling and model coefficients may cause the objective values 
returned to classify the wrong solution as being superior. 

This paper takes a fresh look at the problems of uncertain and noisy systems, 
both with single and multiple objectives, in order to provide a selection process 
that is aware of the uncertainties and noise. The techniques discussed form a 
small step towards creating algorithms that can address the problems associated 
with the different categories of noisy or uncertain problems. 

2 Problem Definition 

As most engineering problems have multiple objectives that must be satisfied, 
the work concentrates on multi-objective evolutionary algorithms (MOEA) . Car- 
los Coello Coello maintains an excellent database of publications relating to 
multi-objective optimisation Many of the publications tackling engineering 
problems (e.g. 0) use techniques such as MOGA and NSGA 0. These meth- 
ods use ranking techniques to address the problems of non-domination, then use 
sharing to spread the solutions across the objective surface. The use of ranking 
is widespread in EA’s to prevent good solutions taking over the population in 
the early generations of the algorithm. Van Veldhuizen and Lamont m have 
studied the benefits / disadvantages of a number of techniques, including MOGA 
and NSGA, and begun to define techniques for assessing MOEA performance. 
These have been developed in the context of noise-free and certain problems and 
similar work is needed to address noisy and uncertain problems but is beyond 
the scope of this paper. 

In all evolutionary algorithms, the key medium to evolution is being able 
to take two potential solutions to a problem, test them in the problem domain 
against some performance criteria, then given some values relating to the per- 
formance of each, decide which solution is better than the other. With noisy or 
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uncertain problems, we find that given the results of the performance criteria, 
unless they are very different, we cannot say for certain which solution is better. 
Thus we must now refer to the probability that one solution is better than the 
other. This paper aims to review the process needed in order rank a set of ob- 
jective results, given that we can no longer make a crisp decision about solution 
superiority. 



3 Comparing Two Uncertain Fitness Measurements 

3.1 Introduction 

In a noise free situation, if we have two fitness values, A and B, and are try- 
ing to minimise, the lower value is always superior. However, if we know the 
fitness values are subject to noise, even if the measured fitness A is less than 
the measured fitness B, the expected value of the distribution from which A is 
drawn may be greater than the expected value of the distribution from which 
B is drawn. Therefore we would make the wrong decision. In the presence of 
noise, if we choose the simple case of take the best measured objective, we need 
to quantify the probability that we have made the wrong decision. 



3.2 Analysis of Distributions with Unknown Expected Values 



If we have a pair of samples from distributions with known characteristics and 
spread, but unknown expected values, we need to be able to calculate the prob- 
ability that although sample A is less than sample B say, the expected value of 
distribution B is less than the expected value of distribution A. This will give 
us a probability of making the wrong decision. Figure D shows a scenario with 
two Gaussian distributions. 

Here A and B are the measurements that were observed, and /r is an arbitrary 
point. The observed value A was less than B and is therefore superior. If the 
expected value of A was actually at point the expected value of B would have 
to be in any position to the left of /r for us to make the wrong decision. We 
can calculate the probability of the distributions being in this location as the 
probability of value A occurring, given p,a, multiplied by the probability that fib 
is less than fia, shown as the shaded region on Fig. QJ This may be described 
mathematically as 



/ OO 

pdU(A — fi) ■ cdfs((A — fi) > {B — fi))dfi . 

-OO 



With Gaussian distributions, we may write this as 



P( wrong) = 






O.rr'^ 






e "U dy\dx . 



( 1 ) 

( 2 ) 



Equation 0has the limits on the inner integration adjusted, as the Gaussian 
distribution is symmetrical pdf(o) = pdf(— a) and cdf(a, oo) = cdf(— oo, —a). The 
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Fig. 1. Choice between two noisy values 



axis is shifted to make sample point B = 0 then distribution B is normalised, 
modifying distribution A accordingly. Equation 0 has been simplified with the 
replacements m = and s = o’a/o'b- 



P(wrong) = 



P(wrong) = 









b ’ b Pfi 1 
O I -L 



TaV^ 



I — oo ^ 27T 



e dy \ dfj, 



1 






2 



e 2 s 



We may now use the error function 



/ — oo a/ 27T 



e ^ dy ] dy 



2 2 

erf (x) = —j= / e‘ dt 
V Jo 



to give 



('OO 



P( wrong) = 



1 






J — oo y S’s/^ilT 
1 1 



e 2 s 



l + erf(^)' 



dx 



2 2sv^ J-c 



_ (x — Tn.)2 2; 

e 2^2 erf(^^)da; . 

) V 2 



(3) 

(4) 

(5) 

( 6 ) 



Unfortunately dOJ is difficult to integrate directly. An alternative approach 
is to recognise that the difference between two Gaussian distributions is also 
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Gaussian but with an expected value that is the difference between the expected 
values of the two distributions and a variance which is a sum of the two variances 
(Cramer’s Theorem), i.e., 

^(A*a, CTa) - al) = N{fXa ~ fJ-b, Ca + ^b) (7) 



If A dominates in a minimisation sense, then the area under the resulting 
curve from zero to infinity will give the probability that the decision that A 
dominates B is wrong. If we normalise B to give 



- iV(0, 1) = 4 + 1) 



(Jb CTb 

then the probability of being wrong is 
P(wrong) = 



\/27r(s++T) Jo 






(Jb (Jb 

= N{m, + 1) 



e dx 



\/(2 + 2 s ^) 



( 8 ) 

(9) 



(10) 

( 11 ) 



Equation can be shown to be equal to Q by numerical integration. 



3.3 Numerical Approximation of Probability 

If we cannot afford to do multiple evaluations for each chromosome (often the 
case), we can choose a random chromosome before running the EA and perform 
multiple evaluations to estimate the noise standard deviation (and possibly noise 
distribution). This estimate may be used subsequently for all the comparisons 
of individual samples using the equations in section [El If the noise statistics 
are known to be nonlinear, it may be advantageous to either re-estimate the 
statistics every few generations from an average chromosome, or even from the 
current population. When the same standard deviation is used for comparing 
two objective values, a a = <Jb therefore s = 1. Thus the probability is only 
determined by the value of m. 

As the case of s = 1 is likely to be the most commonly used, we can tailor 
the equations specifically. The equations are calculated as the probability of 
being wrong in minimisation, this is the same as the probability of acceptance 
in maximisation. Thus the probability of sample A dominating sample B in 
maximisation (P(A > B)) is 



P{A > B) 



1 1 

2 + 2"'<y> 

1 1 ,,A-B. 

2 + 2“'<^> 



(12) 



therefore if A = 0, i? = 5 and cr = 1, P[A > B) = 0 as expected. 
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Unfortunately, the error function erf (x) is not easy to calculate quickly. It can 
be approximated using Chebyshev fitting im Section 6.2] but even this is not 
very quick. Recognising that HU) is sigmoidal in shape, other standard sigmoidal 
curves have been fitted to give a good approximation to the curve, but allow the 
probability to be calculated quickly. Figured shows the curve approximation and 
(II3 & m show the equations. The results of the two different approximations 
are so similar to each other, they appear as a single line on the graph. 

1 TH 

P(H > R) -(1 + tanh , ) (13) 

^ ^2^ 0.8v/2 + 2s2^ ^ ^ 

P{A > P) « (14) 

1 + e V2+2a2 




Fig. 2. Approximation of P{A > B) against m, dashed curve is (|l qi 



The small errors in the approximation can be tolerated as a tradeoff for 
the speed gain. Further improvements in calculation speed may be obtained for 
certain problems by utilising the decomposition of tanh(A + P). This is detailed 

in IT^ . 

3.4 Multi-objective Fitness Functions 

With multiple objectives, we no longer have only two possible outcomes from 
comparing two objectives A and B. We now have the possibility of the two 



Evolutionary Multi-objective Ranking with Uncertainty and Noise 335 



objectives being non-dominated. We therefore can have P{A < B), P{A > B), 
and P{A = B) (A is equivalent to B in Pareto optimal sense). Figure 0 shows 
the effect graphically, with the point A in the centre of the figure ([0.5, 0.5]) 
representing one sample of the fitness. The shaded regions correspond to regions 
in which there is information to drive the evolutionary process, i.e. A is either 
dominates or is dominated by any point x that lies in the shaded region. In the 
remaining regions, any point x will be non-dominated when compared to point 
A and we have no way of deciding between the points. For the ranking process, 
the points are equivalent, and just as good as each other. 

If we have two, k objective, independent fitness measurements with corre- 
sponding objective values Ai . . . Ak, and B\ . . . Bk, the probabilities P{A < B), 
P{A > B), and P{A = B) are simply 



fc 



P{A<B) = l[P{A,<B,) 


(15) 


k 

P{A>B) = \\P{A,>B,) 


(16) 


k 

1—1 


(17) 


j — 

P{A = B) = l- P{A <B)~ P{A > B) . 


(18) 



4 Probabilistic Ranking and Selection 

4.1 Introduction 

Ranking is often employed to prevent a superior solution dominating the early 
populations in the evolutionary process. The conventional ranking process, how- 
ever, does not take the uncertainty in the measured fitness values into account. 
The following sections provide a fresh view of the ranking process and develop 
theory for multi-objective ranking of uncertain fitness measurements. 

4.2 Single Objective Ranking 

Figure 0 shows seven fitness values to be ranked. If we are minimising, the best 
fitness value is the lowest. In the case shown, value A will get rank 0, and value 
G will be rank 6. Values C and D are equal and therefore should be assigned 
the same rank. We can assign rank values as shown in Table [D 

If we did not have a tie between C & D, we could use linear selection II I Pll to 
calculate probabilities of selection, based on the ranked fitness, where n is the 
number of fitness values and Ri is the rank of individual i. The sum of the rank 
values on the denominator will sum to n{n — l)/2 = 21 which is the sum of the 



Objective 2 
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Non-Dominated 


P(x>A) 


P(x<A) 


A 

Non-Dominated 



0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 



Objective 1 



Fig. 3. Noise free non-domination map (maximisation) 



ABC/D E F G 




Table 1. Ranks of example fitness values 



Value A B C D E F G 
Rank 0 1 2 2 4 5 6 
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arithmetic series zero to six, therefore the best individual will get a probability 
of selection of 2/n and the worst a probability of zero. 



P(selecti) = 



{n-l)-Ri 2{{n-l)-Ri) 






i{n — 1) 



(19) 



If we use the rank values in Table d with both the tied fitness values being 
given the best ‘untied’ rank, we find that the sum of the ranks is no longer 
consistent, and in this case, ~ 20- Alternatively, as C Sz D are tied, it 

may be better to penalise them both a little and therefore take an average of the 
rank positions they could have shared, i.e., give them both a rank of 2.5. This 
would return the overall sum to be 21 and would be consistent, no matter how 
many fitness values share a rank. This is the method most used for ranking a 
vector of data. 

We can view the ranking process as counting the number of fitnesses that 
dominate the fitness of interest 0. If a fitness equal to the current one is en- 
countered, then it is half dominating, and half dominated by the current fitness. 
Therefore we can create the rank position numbers by this simple counting pro- 
cess. For example, E is dominated hy A, B, C & D and therefore has a rank of 
4. Value C is dominated hy A & B but is tied with D and so gets a rank of 2.5. 

Alternatively, we could consider the dominating / not dominating decision as 
being the probability that each fitness value dominates the value of interest. For 
example, if we consider fitness C, the probability that A dominates C is one. 
The probability that G dominates C is zero. The probability that D dominates 
C , from (0 with m = 0, is P = 0.5. Thus we can represent the rank position 
as the sum of probabilities of domination as shown in (12011 . where P{Fj > Fi) is 
the probability that fitness value j dominates fitness value i. 



R, = > P.) 

1=1 



* 7^1 



( 20 ) 



In (Emi , we have to be sure not to compare fitness Fi with itself. If we did, we 
would get an extra probability of 0.5 added to the sum. We can therefore include 
Fi in the sum, but subtract the effect of comparing the fitness with itself. This 
is shown in (EU). 

n 

R^ = J2 - 0-5 ( 21 ) 

1=1 

As (f2 1 |l is based on probability, if the fitness values are uncertain, we can use 
(El or the approximations dEI) or dm to calculate the probability of domination. 
For example, if fitness values A to G have a standard deviation of cr = 1, the 
rank positions (using II 1 311 1 compared to the no noise case are shown in Table El 
With (T = 0, we have conventional ranking and the probabilities will range 
from 2/n to zero. If cr = oo, all of the fitness values will be assigned the same rank, 
and will have a probability of selection of 1/n. Thus the standard deviation of 
the uncertainty has a similar effect to selective pressure in conventional selection 
processes m 
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Table 2. Ranks with uncertainty of ct = 0 and cr = 1 



Value Rank (cr 


= 0) Rank {a = 1) 


A 


0 


0.38 


B 


1 


1.27 


C 


2.5 


2.31 


D 


2.5 


2.31 


E 


4 


4.17 


F 


5 


5.07 


G 


6 


5.49 



4.3 Multi-objective Ranking 

With multiple objectives, we now have three possible outcomes from comparing 
the two fitness values: A dominates B, A is dominated by B, and A and B 
are non-dominated. If we apply the single objective ranking equation, we find 
that the total of the rank positions is no longer n(n — l)/2 as we now have 
to account for the non-domination. If we have no noise, for two fitness values 
where A dominates B, P{A > B) = 1, P{A < B) = 0, and P(A = B) = 0 
Therefore when we sum the probabilities of domination, the contribution from 
this pair will be 1. If the fitness values are non-dominated, the corresponding 
probabilities are P(A > B) = 0, P(A < B) = 0, and P(A = B) = 1. We have 
now lost the value 1 from the probability of domination calculations, therefore 
reducing the sum of ranks total. This state will be the same when we compare A 
to B and also when we compare B to A, therefore if we sum all the probabilities 
of non-domination, this will give us twice what was lost from the probability of 
domination calculations. 

If we consider the ranking case for a single dimension, if A and B are iden- 
tical, we cannot choose between them and so add in 0.5 to the sum. With 
non-domination, we also have the situation where we cannot choose between 
objectives and should therefore add 0.5 to the sum as required. In the case of 
uncertain measurements, we can multiply the value of 0.5 by the probability of 
non-domination, and still subtract off 0.5 to allow for comparing the individual 
with itself, thereby maintaining the sum of the rank positions as n(n — l)/2. 
Thus we can add the non-domination term into (EH). The rank calculation for 
multi-objective ranking is shown in (I22II , where n is the number of measurements 
being ranked. 

n 1 ^ 

B^ = J2 > ^*) + 9 E - 0-5 (22) 

This probabilistic ranking equation allows chromosomes to be selected based 
on uncertain multi-objective fitness measurements. For the objectives shown in 
Fig. 0 we can calculate the rankings in order to minimise the fitness values. 
Table 0 shows ranks (R) for no noise, and 1 standard deviation noise. 
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Fig. 5. Multiple fitness values to be ranked 



In the example, we see that A is non-dominated with B, C, D, & E and 
therefore gets a rank of 2. Fitness B is non-dominated with A, C, & D but 
shares an objective value with E, thus being half dominating and half non- 
dominated with E, the rank of B is 1.5 from the three non-dominated points 
and 0.25 from E, giving a total of 1.75. We also see that each of the columns of 
Table 13 sums to 15 ( = n{n — l)/2) as expected. The ranking process is O(n^), 
as are many of the other ranking methods |FI9I . 



Table 3. Ranks with uncertainty of <t = 0 and cr = 1 



Value R (<T = 


0) R (o- = 1) 


A 


2 


2.27 


B 


1.75 


1.65 


C 


1.5 


1.42 


D 


1.5 


1.92 


E 


3.25 


3.22 


F 


5.0 


4.53 



In the general noisy or uncertain scenario, we see that the proximity of other 
fitness values, even if only close on one objective, can influence how the rank is 
assigned. Measurements such as C which are relatively well spaced out on all 
objectives are ranked more highly than other fitness values that are uncertain. 
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With no noise, the basic ranking by just counting how many points dominate 
each fitness measurement described by Fonseca and Flemming |B| is very similar, 
but does not allow for the non-dominated cases. The sum of the rank values will 
not be consistent if non-dominated solutions are present, causing a bias towards 
non-dominated solutions over other solutions. The ranking used by Srinivnas 
and Deb 0 is based on ‘layers’ of non-dominated solutions and has no consis- 
tency with regards to how many layers, or ranks, are produced, therefore making 
calculating selection probabilities awkward. 

It is interesting to note that if we require an objective to be maximised, set- 
ting a negative will cause the probabilities to be calculated for maximisation, 
setting a negative has the same effect as negating the fitness values (the conven- 
tional way of converting from minimisation to maximisation). Therefore both 
minimisation and maximisation objectives may be handled easily by just setting 
the sign of the corresponding value of a appropriately. 

Limits on objectives, constraints on the chromosomes, and sharing can all 
be implemented easily within this ranking framework, allowing interactive de- 
cision making with uncertain or noisy systems viable. The equations for limits, 
constraints, and sharing are derived and discussed in H2|. 

5 Experiment Results 

5.1 Introduction 

Noise and uncertainty can be split into two broad categories relating to noise 
that occurs within the process (Type A) and measurement noise (Type B): 

1. Type A Noise: Noise is applied to the chromosome before the objective 
function is calculated, i.e. O = F{x + N). 

2. Type B Noise: Noise is applied to the objective function after calculation, 
i.e. O = F{x)+N. 

Both types of noise are of interest and often the observed noise will be a combi- 
nation of type A and B. 

Trials have been performed to assess how the noise effects the assigned rank 
position within a population of chromosomes. For the following results, 100 two- 
parameter chromosomes were generated uniformly distributed in the range [0,1]. 
A scaled version of the objective function MOPS, defined by Van Veldhuizen and 
Lamont HD! and given in (l'3.3ll . was used to provide input data to the ranking 
processes, with either type A or B noise applied as appropriate. The data were 
ranked and the assigned rank postion for each chromosome recorded. The process 
was repeated 1000 times with different values chosen for the applied noise each 
time. For each chromosome, the standard deviation of the rank position was 
calculated. The mean standard deviation of the 100 chromosome rank positions 
was then generated and plotted. 
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03 
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+ sin{x^ + y^) 



{ix — 2y + 4,y {x — y+lY 



1 

2-2 _|_ _|_ 1 

mi 



- l.le“ 



27 

^2 ,,2 



15 



8.249 
m 2 — 15 

46.940 
m 3 -I- 0.1 



0.296 



(23) 



In l|23ll . x(l) Etnd x(2) are the two parameters of the input chromosome in the 
range [0,1]- The parameters x and y are scaled to lie within [-3,3] as defined by 
Van Veldhuizen and Lamont. The three objective functions are then calculated 
and scaled to give each of the objectives in the range [0,1]. Noise was then applied 
either to the input chromosome x for type A noise, or to the output objectives 
O for type B noise. The applied noise was Gaussian with a standard deviation 
of a. 

The ranking algorithms from NSGA and MOGA were generated for compar- 
ison with the new multi-objective probabilistic selection evolutionary algorithm 
(MOPSEA) ranking process developed in this paper. With a different set of 100 
initial chromosomes, a slightly different set of graphs will result. The differences 
have been found to be small however. 



5.2 Results 

From figures El & Q it is clear that both MOGA and MOPSEA outperform 
the NSGA ranking process in the presence of noise for this objective function. 
As the uncertainty parameter (t„ is increased, it is clear that MOPSEA can out 
perform both alternative algorithms. The specific performance of each algorithm 
is dependent on the objective function though. Other objective functions are 
covered in ini. 

6 Conclusions 

The results have shown that the modified ranking process can reduce the dis- 
turbances in the rank positions caused by noisy objectives. Unlike conventional 
ranking processes, the rank values and therefore the corresponding selection 
probabilities take some account of the noise and uncertainty in the system. The 
theory developed in this paper forms an important first step towards addressing 
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Fig. 6. Applied noise with respect to mean standard deviation of rank position 
for MOP3, type A noise. Performance of MOGA and NSGA ranking compared to 
MOPSEA with a„ = 0 &i a„ = 0.3 




Fig. 7. Applied noise with respect to mean standard deviation of rank position 
for MOPS, type B noise. Performance of MOGA and NSGA ranking compared to 
MOPSEA with (T„ = 0 & C7n = 0.3 
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directly noise and uncertainty in multi-objective problems. The simplicity of the 
ranking and selection equations may also provide a route to further theoretical 
research into the operation and performance of evolutionary algorithms. 
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Abstract. This paper proposes an exploratory multi-objective evolutionary 
algorithm (EMOEA) that makes use of the integrated features of tabu search and 
evolutionary algorithms for effective multi-objective optimization. It 
incorporates a tabu list and tabu constraint for individual examination and 
preservation to enhance the evolutionary search diversity in multi-objective 
optimization, which subsequently helps to avoid the search from trapping in 
local optima and at the same time, promotes the evolution towards the global 
Pareto-front. A novel method of lateral interference is also suggested, which is 
capable of distributing non-dominated individuals uniformly along the 
discovered Pareto-front at each generation. Unlike existing niching/sharing 
methods, lateral interference can be performed without the need of any 
parameter setting and can be flexibly applied in either parameter or objective 
domain depending on the nature of the optimization problem involved. The 
proposed features are experimented in order to illustrate their behavior and 
usefulness in the algorithm. 



1. Introduction 

Evolutionary algorithms have been recognized to he well suited for MO optimization 
problems [1,2]. Unlike conventional methods that linearly combine multiple attributes 
to form a composite scalar objective function, evolutionary algorithm for MO 
optimization incorporates the concept of Pareto’s optimality or modified selection 
schemes to evolve a family of solutions at multiple points along the trade-off surface 
simultaneously. Since Schaffer’s work [3], evolutionary techniques for MO 
optimization have been gaining significant attentions from researchers in various 
fields, which are reflected by the high volume of publications in this topic in the last 
few years (over 25 Ph.D. theses, more than 80 journal papers, and more than 300 
conference papers). For more information on various techniques of handling multi- 
objective optimization problems via evolutionary algorithms, readers may refer to the 
literatures of [4-6] . 

This paper proposes a new exploratory multi-objective evolutionary algorithm 
(EMOEA), which incorporates the memory-based feature of tabu search (TS) to 
maintain the stability of MO optimization towards a global and uniform Pareto-front. 
The hybridization of TS in evolutionary optimization helps to improve the MO search 
performances by avoiding repeats of previously explored paths to the found peaks, 
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1. e., local optima in the search space is avoided while good regions are being well 
explored. Besides, a novel method of lateral interference, which is highly efficient of 
distributing non-dominated individuals uniformly along the discovered Pareto-front is 
also proposed. It can be performed without the need of any parameter setting and can 
be flexibly applied in either parameter or objective domain depending on the nature of 
the optimization problem involved. 

2. Exploratory Multi-objective Evolutionary Algorithm 

In general, multi-objective (MO) optimization can be defined as the problem of 
optimizing a vector of non-commensurable and often competing objectives or cost 
functions, viz, it tends to find a parameter set P for 

Min F{P) (1) 

Pb0 

where P = {pi, p 2 ,..., p„] is a individual vector with n parameters and defines a 
set of individual vectors. {/i,/2,---,/m} are m objectives to be minimized and F = {fi, 
Instead of a single optima, solution to MO optimization problem is often a 
family of points known as Pareto optimal set, where each objective component of any 
point along the Pareto-front can only be improved by degrading at least one of its 
other objective components [7,8]. In the total absence of information regarding the 
preferences of objectives, ranking scheme based upon the Pareto optimality is 
regarded as an appropriate approach to represent the strength of each individual in an 
evolutionary algorithm for MO optimization [2,8,]. A vector Fa is said to dominate 
another vector F*, denoted as Fa -< F/,, iff 

fa,i ^ fb,i ^ e (1,2,..., m} and 3 ye (1,2,..., m} where < ftj (2) 

The Pareto ranking scheme assigns the same smallest cost for all non-dominated 
individuals, while the dominated individuals are ranked according to how many 
individuals in the population dominating them. So, the rank of an individual x in a 
population can be given by rank(x) = I + q„ where is the number of individuals 
dominating the individual x in the objective domain [2]. They also extended the 
Pareto’s domination scheme in their proposed multi-objective genetic algorithm 
(MOGA) to include goal and priority information for MO optimization. Although 
MOGA is a good approach, the algorithm only allows a single goal and priority vector 
setting, which may be difficult to define in a-priori to an optimization process [4]. 

With a modified Pareto-domination scheme. Tan et al. [9] proposed a unified 
multi-objective evolutionary algorithm (MOEA) that is capable of comparing the 
domination among individuals for multi-objective optimization dealing with both soft 
and hard optimization constraints. The scheme also allows the incorporation of 
multiple goals and priorities with different combinations of logical “AND” and “OR” 
operations for greater flexibility and higher-level decision support. Extending from 
the Pareto's domination and ranking schemes of Tan et al., [9], this paper proposes an 
exploratory multi-objective evolutionary algorithm that incorporates the memory- 
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based feature of tabu search to maintain the stability of MO optimization towards a 
global and uniform Pareto-front. Sections 2.1 and 2.2 detail the principle of lateral 
interference and tabu-based individual examination rule, respectively. The algorithm 
and implementation of EMOEA including the features of lateral interference and 
tabu-based individual examination system is then fully described in Section 2.3. 

2.1 Lateral Interference (LI) 

In order to evolve an equally distributed population along the Pareto-front and to 
distribute the population at multiple optima in the search space, many methods 
[1,10,11] have been proposed. Among them, the ‘niche induction’ technique by 
means of a sharing function [10] is the most popular approach for evolving an equally 
distributed population along the Pareto-front in MO optimization. This method creates 
sub-divisions in the objective domain by degrading an individual’s fitness upon the 
existence of other individuals in its neighborhood defined by a shared distance. 

To avoid the setting of sharing distance, a new population distribution method is 
proposed. It is capable of uniformly distributing all individuals along the Pareto-front 
for MO optimization without the need of any parameter setting is proposed. It can be 
applied in either the parameter domain or in the objective domain as needed. The 
method is called Lateral Interference which is motivated from the ecologist point of 
view for resources competition [12]. It works based upon the principles of 
exploitation competition and interference competition, which form the basis of 
distributing population uniformly along the Pareto-front in MO optimization without 
the need of any parameter setting. According to the first principle (exploitation 
competition), individuals with higher fitness or lower cost values will always be more 
likely to win when compete with individuals with lower fitness or higher cost values. 
The second principle (interference competition) only takes place among the 
individuals with same level of fitness or cost, or in other words, individuals that are 
equally strong in the exploitation competition. 

Let us consider a sub-population P’ containing N’ equally-fit (or same level of 
fitness or cost values where the intermediate between two level, t > 0) individuals in 
the m-dimensional observed feature space S (can be either in objective space or 
parameter space), xu X 2 ,..., Xt^’, x,- e UiX U 2 X...X !/„ V i = 1, 2,..., N\ Uj = U{aj, bj), 
where f/(a,, h,) is any value within the range of a, and bj while a, and bj are the 
minimum and maximum boundary values of x,, V y = 1 , 2, . . . m as given by, 

flj = minjx,- j Vi = 1,2,..., A’l (3a) 

hj = maxjx,- j Vi = 1,2,..., A’j (3b) 

The metric distance between any two individuals, i and j, is defined by: 




d(xj,x j) 



(4) 
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where | • implies the 2-norm. To limit the distance metric to the interval of [0, 

1], normalized metric distance between individual i and j, d[xj,Xj), is used and 

computed by dividing the metric distance over the metric distance between a and b, 
where a and b are the minimum and maximum boundary vector of all the variables 
respectively, i.e., 



d [xi ,Xj)= d{xi , Xj ')j\a — b'^^ (5) 

Subsequently, the nearest individual from individual i, denoted as s(i), is defined as 
the individual that gives the smallest norm or mathematically, 

s{i)= {j : d[xi,Xj)< d{xi,xj^\ j i,\/k = 1, A: A / & j} (6) 

and the influence distance of individual i is given as. 

The larger the value to A,, the stronger the individual i to interfere its nearby 
individuals. With the information of influence distance, the territory of each 
individual can be defined as given in Definition 1 . 



Definition 1: (Territory) 

The territory of individual i, denoted as r,, is the area where any other 
individuals in within will be interfered and inhibited by individual i from 
getting the resource. The sufficiency condition for a given point Xj to be within 
the territory Ti of individual i, or mathematically x, e Ti provided that j A i, is 
d[xj,Xg(i'j)</ll, where s(i) is the index of individual that is closest to 

individual i as defined in eqn. 6 while the influence distance A, of the 
individual i is as defined in eqn. 7. 



After determining the territory for each individual, the severity or impact of being 
interfered and inhibited, or simply the interfered severity, denoted as of 

individual j can be measured by means of the number of times that it has been 
interfered and inhibited. 



HAJ) 



A (iJ) = 



1, if Xj G Ti 
0, otherwise 



(8a) 



where. 



(8b) 
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The proposed lateral interference differs to the sharing method in the sense that, 
instead of defining a fix sharing distance for the whole population, it applies the 
concept of territory where the area of territory is dependent to the influence distance 
which is adaptive and dynamic from one individual to the other. This eliminates the 
need of prefixing the sharing distance as required by the sharing method, where good 
setting of sharing distance estimated upon the trade-off surface is critical and is 
usually unknown in many optimization problems [4] . 

The lateral interference can be easily incorporated with Pareto’s dominance in the 
genetic selection process. After the cost evaluation process where every individual in 
a population has been evaluated based on the MO functions, these individuals will be 
ranked according to the Pareto’s dominance and ranking scheme [9], with smaller 
rank representing fitter individual. The individuals are then fed to the lateral 
interference, in which the rank values will be interfered/modified according to the 
territory of each individual. As mentioned, the aim of the interference competition is 
to further differentiate those individuals with same rank value after the exploitation 
competition in order to distribute the individuals uniformly. Before the lateral 
interference, the non-dominated individuals in the population are classified into one 
category, with similar dummy cost value. To maintain the diversity of the population, 
these classified individuals are undergone lateral interference and the resulted severity 
(Hs) of being interfered for each classified individual is added to its dummy cost. This 
group of individuals is then ignored and another layer of non-dominated individuals 
(ignoring the previously classified ones) is considered, where the assigned dummy 
cost is set higher than the highest interfered cost of the previously classified 
individuals. This process continues until all individuals in the individual list have 
been classified. The final resulted individuals' dummy cost value after the lateral 
interference is referred here as interfered cost. The smaller value of the interfered 
cost, the better is the individual. 

2.2 Tabu-Based Individual Examination Scheme 

This section presents an individual examination scheme that incorporates the feature 
of TS (tabu search) [13] in evolutionary algorithm to avoid repetition of previous 
move for MO optimization. Besides maintaining the stability of evolution towards the 
global and uniform Pareto-front, such scheme integrates the knowledge of tabu list to 
enhance the MO search performance by avoiding repeats of any paths to previously 
discovered peaks as well as encouraging long distance exploration to discover other 
possible peaks. Fig. 1 depicts the heuristic reasoning for individual examination 
scheme. Given a tabu list and individual list, every individual is being examined with 
reference to the tabu list whether to accept or reject the individual from the individual 
list while at the same time, the tabu list is updated whenever an individual dominates 
any member of tabu list. Starting from the first individual in the reproduced individual 
list, if the examined individual dominate any member in the tabu list, the individual 
will replace the dominated member(s) in the tabu list. Otherwise, if the individual is 
dominated by any member in the tabu list, it will be kept in the individual list if any of 
its objective component is better than the best objective component value found in the 
tabu list or the individual is not a tabu. For the former case, considering a 
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minimization problem, an individual x, represented by vector Fj^(i) V i g {1, 2,..., mj 
in m-dimensional objective domain, is said to have any of its objective component 
value better than the best objective component value in tabu list if and only if 3 i s.t. 
Fj{i) < min({F/(0}) V Z = g { 1, 2,..., A^,} where N, is the size of tabu list. This 
criterion of acceptance is used to diversify the individuals in objective domain in 
order to provide a wider range of choices and better information exchange while 
undergoes the genetic crossover operation. If both the conditions are not satisfied, the 
individual will be rejected from the individual list and prohibited from surviving in 
the next generation. In case that the individual is not dominated by any member in the 
tabu list and if the tabu list is not full i.e., its size does not achieve the maximum limit, 
the individual will be added to the tabu list. Otherwise, if the individual is able to 
interfere more than one member in the tabu list within its territory, it will be allowed 
to replace the tabu member that has the shortest distance from the individual in the 
space concerned. This reasoning is to promote the uniform distribution of the tabu list. 
If the condition is not met, the individual will be examined with the objective domain 
diversification test and tabu test as explained above. This process of examination is 
then repeated for the next individual in the individual list until all the individuals are 
tested. Note that although all the members in tabu list are non-dominated, not all the 
non-dominated individuals in the individual list are added to the tabu list when they 
are tabu. This is to count on the computation effort of keeping all the non-dominated 
individuals in tabu as well as to avoid the danger of too much emphasis on the good 
individuals, which may lead to premature-convergence in the evolution. 

2.3 Flow Chart of EMOEA 

The overall program flowchart of the proposed EMOEA algorithm, which integrates 
both the lateral interference for population diversity as well as the individual 
examination scheme with tabu list for examining evolved individuals in the individual 
list, is shown in Fig. 2. In general, EMOEA involves two different lists, the individual 
list and the tabu list, that are interacting and influencing with each other along the 
evolution. Solutions in individual list play the part of inductive learning on the basis 
of genetic evolution while the solutions in tabu list play the role of controlling the 
evolution through heuristic reasoning approach (deductive). 

At the initial stage of evolution, a list of number of individuals is initialized 
randomly or biased with a-priori knowledge, where is the size of individual list 
in the evolution. The individual list is then decoded to parameter vectors for cost 
evaluation. Subsequently, all the evaluated individuals are ranked according to the 
specifications assigned. All the non-dominated individuals (those with rank =1) are 
copied to the empty tabu list while the rest of the individual list are fed to lateral 
interference to compute for interfered cost as described in Section 2.1. If the stopping 
criterion is not met, genetic operations will be applied to the evaluated individuals. 
Here, simple genetic operations consist of tournament selection based on interfered 
cost, standard crossover and mutation are performed to reproduce offspring for the 
next generation. Note that the use of tournament selection avoids the need of 
chromosome sorting, interpolating and fitness averaging at every generation as 
required by the ranking algorithm [2], which consequently reduces the overhead 
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computation time. After the process of genetic operations, the resulted reproduced 
individual list with size will he evaluated and examined by the tabu-based 
individual examination scheme. This scheme is equipped with knowledge of and tabu 
list to further enhance the search performance by avoiding repeats of previous 
exploration path to the found peaks. Besides encouraging long distance exploration to 
provide better exploration for other possible peaks in the solution space, it is also 
capable of maintaining stability of evolution towards the global and uniform Pareto- 
front. Subsequently the resulted new individual list as well as the updated tabu list are 
fed into the MO genetic evolution to form the combined individual list, which has the 
size of Nc = Nc + Ni, where N, is the size of tabu list. This combined individual list is 
fed to the next generation of evolution and this process is repeated until the stopping 
criteria is met. 



3. Experimental Results 

In order to validate the capability of EMOEA in escaping from harmful local optima, 
the benchmark problem originated from Deb [14] is used as a test-bed here. It is a 
two-objective minimization problem containing local optima that could easily be 
trapped by the optimization algorithms. In this paper, the original test problem is 
modified and expanded such that the global optimum is farther away from the local 
optimum and higher dimensionality of the search space is considered. The purpose of 
these modifications is to achieve higher degree of optimization difficulties in the 
sense that it provides more tendencies for the search algorithms to pre-maturely 
converge to local optimum, and has less possibility to discover the global optimum 
while the population is converging to the local optimum. The modified two-objective 
functions to be minimized are: 



where, g,- 



where, 

and. 



fl=Xh 
1 3 

f2= — Ugi, 
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(9a) 

(9b) 

(9c) 

(9d) 

(9e) 



Fig. 3 depicts the function of g, for 0 < x,+i < 1 . As can be seen that g, is a bimodal 
function with x,+i = 0.1 as the global minimum and x,+i = 0.9 as the local minimum 
solutions, where their distance of 0.8 has been increased by 100% from the original 
problem with a distance of 0.4 [14]. Fig. 4 shows the/i -/2 plot, with local and global 
Pareto-optimal curve represented by dashed and solid line, respectively. Note that the 
shaded region represents the unfeasible area. 
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Fig. 1. Heuristic reasoning of tabu-based individual examination scheme 

Deb [14] has shown that most solutions lead towards the local Pareto-optimal front 
and only a few solutions leads towards the global-optimal front. In other words, this 
optimization problem has high bias towards the local-optimal and thus it is a Genetic 
Algorithm (GA)-deceptive problem [15]. This behavior illustrates the property that a 
GA-deceptive problem can be a GA-hard problem. In Deb [14], the NSGA [8] was 
used and the simulation results indicated that it got trapped at the local Pareto-optimal 
solutions. He also observed that at 25 runs there exist at least one solution in the 
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global basin in the initial population and still cannot converge to the global Pareto- 
optimal front. Hence, this optimization problem should challenge the optimization 
algorithms to search for the global Pareto-optimal with the existing of high-biased 
local Pareto-optimal in the MO optimization. 




Fig. 2. Overall program flowchart of EMOEA 

The simulation is implemented with the decimal coding scheme of 3-digit per 
parameter [9], standard two-point-crossover with a probability of 0.7, and standard 
mutation with a probability of 0.01. The lateral interference is performed in the 
objective space without the need of any parameter setting. Furthermore, a small value 
of Nc = 30 and N,= 10 is employed to challenge the EMOEA. Simulation is run until 
it reaches the generation of 240, which is chosen with the purpose of visualizing how 
the EMOEA escapes itself from local optimum and finds new direction towards the 
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global optimum, which was initially trapped at the local optimum. Fig. 5 illustrates 
the trend of parameters in the population versus the generation number, where each 
dot represents the parameter value of a particular individual in the population. Note 
that in Fig. 5a, as the dimension of parameter 1 is flat, the values of first parameter in 
the population tends to distribute themselves to uniformly cover the Pareto-front in 
the objective domain, as desired. For other parameters, i.e., the 2"‘*-4* parameters, 
however, there is a harmful local minimum at the value of 0.9 and a global minimum 
at 0.1 that is far away from the local minimum. 





^1*. h.i 

Fig. 3. global and local minimum Fig. 4. Global and local Pareto-optimal curve 

As can be seen in Figs. 5b-5d, the population is trapped at the local optimum of 0.9 
initially, as indicated hy the high population density at around 0.9 in y-axis at the 
initial stage of evolution. Although the EMOEA is initially trapped at the local 
optimum, the proposed heuristic reasoning of individual examination rule with tabu 
list keeps the EMOEA in exploring for other unvisited search space and subsequently 
discovers the global optimum of 0.1. This is illustrated in Figs. 5b-5d where the 
values of 2"‘* to 4* parameters start to converge to 0.1 as the evolution proceeds to 
approximately the generation of 20, 50 and 150 for the three parameters, respectively. 
The figures also show that EMOEA keeps exploring for other unvisited areas even 
after the global optimum has been found, which keeps the search diversity for other 
possible peaks as well as allows the diversion of evolution to other concentrated 
regions. This property is useful for real-world optimization problem where the final 
global optimum may be drifted or changed due to changes in the environment or 
working condition. 

Fig. 6 illustrates the population distribution in the objective domain at different 
stage of the evolution, where the dashed curve represents the local Pareto-front while 
the solid line denotes the global Pareto-front. Individuals in the tabu list are 
represented by small circles, while those in the individual list are illustrated by solid 
dots. The EMOEA was trapped in local Pareto-optimum during the first stage as 
shown in Fig. 6a. Fig. 6b shows that besides searching for other possible optimum, 
the EMOEA distributes the non-dominated individuals uniformly along the currently 
found optimum line. Fig. 6c illustrates the situation where EMOEA has discovered 
the global optimum and is in the phase of migrating from the local to the global 
optimum. Finally, as depicted in Fig. 6d, the EMOEA is in the process of distributing 
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the non-dominated individuals along the found global optimum, while at the same 
time, continues the effort of searching for other better optimum regions. 
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Fig. 5. Distribution of parameters along the evolution 



Apart from this, EMOEA is also evaluated for noisy environment in the solution 
space to test their robustness in the sense that the disappearance of important 
individuals from the population has little effect on the global evolution behavior [16]. 
For this purpose, noisy version of two-objective optimization with three variables is 
constructed here where the function being optimized contains the elements of noise: 



fl =4’ 



1 






0.25 



sin 






+ 1.0 



(10a) 

(10b) 



Instead of performing the optimization on the ’real’ parameters, x„ the optimization 
is performed on the ’corrupted’ parameters with additive noise elements: 



x'l = Xi + N{(T,JU), 



(10c) 



where 0.1 <xj <1; -100 <X; <100 Vi = 2,3 and N((T,jU) is a white noise. The 

population distribution density of the noise is given as normal distribution [17] as 
indicated by eqn. 11, 
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P{x I N(a,ju)) = 






rexp 



27r(T^ 



{x-M? 

2ct2 



( 11 ) 



where // and (/ are the mean and variance of the probability distribution density. 
Fig. 7 illustrates the generated white noise along the generation with // and set as 0 
and 0.1, respectively. With Nc = 30 and N, = 10, the simulation is run for 1000 
generations to acquire the steady-state performance, i.e., the optimization error in the 
evolution is less likely to be due to the immature individuals than the influence of 
noise in the observed environment at this stage. 




(c) 60* generation (d) 240* generation 

Fig. 6. Population distribution at different stages of evolution 

Fig. 8 shows the trace of error denoting the difference between the best-found 
solution and the actual solution in the parameter domain for parameters 2 and 3. 
Parameter 1 is not considered in this observation since there is no noise element 
added in this parameter. As shown in Fig. 8, the optimization errors for both 
parameters are large initially since the best solutions at the initial stage of evolution 
are far away from the actual solutions. However, the error decreases significantly as 
the candidate solutions of EMOEA evolve towards the actual solution, which finally 
reaches to nearly zero at the steady-state. Note that the optimization error should not 
converge to zero due to the existing of white noise as shown in Eig. 7. Nevertheless, 
as illustrated in Fig. 8, the optimization errors for both parameters at the steady-state 








356 E.F. Khor, K.C. Tan, and T.H. Lee 



are much smaller than the magnitude of the noise since for both 

parameters 2 and 3 are less than 0.05 while |A^ (f=>oo)|^ is more than 1.5. As shown in 
Fig. 9, the proposed EMOEA has good robustness for global optimization in noisy 
environment, where it almost traces the entire actual Pareto-front in the objective 
domain at the end of the evolution. 




Fig. 7. Applied white noise for noise sensitivity test 
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Fig. 8. along the evolution for parameters 2 and 3 




Fig. 9. Population distribution at the end of evolution under the existing of noise 
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4. Conclusions 

This paper has proposed a high performance EMOEA, which makes use of the 
integrated features of tabu search and evolutionary algorithms to avoid the search 
from trapping in local optima as well as to promote the population diversity along the 
discovered Pareto-front for MO optimization. It has been shown that the proposed 
algorithm has look-ahead capabilities and is capable of reducing the frequency of 
backward drive using the concepts of tabu constraint. In addition, a novel method of 
lateral interference that helps to distribute non-dominated individuals uniformly along 
the discovered Pareto-front is proposed. Unlike existing sharing/niching methods, the 
lateral interference can be flexibly applied in either parameter or objective domain 
and allows efficient uniform population distribution without the need of any 
parameter setting. Validation on the proposed EMOEA have been performed and the 
experimental results unveiled that it has high capability to escape from local optima as 
well as to accurately identify the actual global optima in noisy environment. 
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Abstract. In this paper we propose a new approach to solve bi-criterion 
optimization problems with ant algorithms where several colonies of ants 
cooperate in finding good solutions. We introduce two methods for co- 
operation between the colonies and compare them with a multistart ant 
algorithm that corresponds to the case of no cooperation. Heterogeneous 
colonies are used in the algorithm, i.e. the ants differ in their preferences 
between the two criteria. Every colony uses two pheromone matrices — 
each suitable for one optimization criterion. As a test problem we use 
the Single Machine Total Tardiness problem with changeover costs. 



1 Introduction 

Ant Colony Optimization (AGO) is an evolutionary approach that has been ap- 
plied successfully to solve various combinatorial optimization problems (for an 
overview see Dorigo and Di Caro 0). In ACO ants that found a good solu- 
tion mark their paths through the decision space by putting some amount of 
pheromone along the path. The following ants of the next generation are at- 
tracted by the pheromone so that they will search in the solution space near 
good solutions. 

Much work has been done to apply evolutionary methods to solve multi- 
criterion optimization problems (see 1 1 dj for an overview) . But only a few of this 
works used the ACO principle. Mariano and Morales 0 proposed an ant algo- 
rithm where for each objective there exists one colony of ants. In particular, they 
studied problems where every objective is influenced only by parts of a solution, 
i.e. an objective can be determined knowing only the relevant part of a solution. 
The objectives are assumed to be ordered by importance. In every generation ant 
k from colony i receives a (partial) solution from ant k of colony i — 1. The ant 
then tries to improve (or extend) the (partial) solution with respect to criterion 
i. When the solutions have passed through all colonies those solutions that are in 
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the nondominated front are allowed to update the pheromone information. Gam- 
bardella et al. 0 developed an ant algorithm for a bi-criterion vehicle routing 
problem. They used two ant colonies — one for each criterion. The two colonies 
share a common global best solution which is used for pheromone update in 
both colonies. Criterion 1 — the number of vehicles — is considered to be more 
important than criterion 2 — the total travel time of the tours. Colony 1 tries 
to find a solution with one vehicle less than the global best solution while colony 
2 tries to improve the global best solution with respect to criterion 2. Whenever 
colony 1 finds a new global best solution both colonies start anew (with the 
new global best solution). Gagne et al. [0| tested a multi-criterion approach for 
solving a single machine total tardiness problem with changeover costs and two 
additional criteria. In their approach the changeover costs were considered to 
be most important. The idea was to construct heuristic values for the decisions 
of the ants that take all criteria into account. The amount of pheromone that 
an ant adds to the pheromone matrix depends solely to changeover costs of the 
solution. All the above mentioned approaches assume that the different criteria 
can be ordered by importance and in the multi colony approaches there is always 
one colony for every objective. 

In this paper we study AGO methods for multi-criterion optimization when 
the objectives can not be ordered by importance. The aim is to find different solu- 
tions which cover the Pareto-optimal front. A multi colony approach is proposed 
where the ant colonies are forced to search in different regions of the nondom- 
inated front. It should be noted, that multi colony ant algorithms have been 
studied before by some authors to parallelize AGO algorithms (a short overview 
is given in iia)- We use heterogeneous colonies where the ants in a colony weight 
the relative importance of the two optimization criteria differently so that they 
are able to find different solutions along the Pareto front. Gooperation between 
the colonies is done by exchanging solutions in the global nondominated front 
that are in regions which “belong to other colonies” . 

Our test problem, the Single Machine Total Tardiness Problem (SMTTP) 
with changeover costs is described in section El A short introduction to ant 
algorithms for solving the single objective versions of our test problem are given 
in Section 0 Our AGO approaches for bi-criteria optimization problems are 
described in Section 0 The multi colony approaches are explained in Section 0 
The tests instances and parameters are described in Section 0 The Results are 
discussed in Section 0 and conclusions are given in Section 0 

2 The Test Problem 

In this paper we use the Single Machine Total Tardiness Problem (SMTTP) 
with changeover costs as our bi-criterion test problem. The problem is defined 
as follows. 

— Given: n jobs, where job j, 1 < j < n has a processing time pj and a due 

date dj and where for every pair of jobs i,j, i ^ j there are changeover costs 

c{i,j) that have to be paid when j is the direct successor of i in a schedule. 
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— Find: A non-preemptive one machine schedule that minimizes the value of 
T = — dj} where Cj is the completion time of job j and 

that also minimizes the sum of the changeover costs C = c{ji,ji+i) 

where ji,j2, ■ ■ ■ ,jn is the sequence of jobs in the schedule. 

T is called the total tardiness of the schedule and C is the cost of the schedule. 
It is known that SMTTP is NP-hard in the weak sense |Sj and is solvable in 
pseudo-polynomial time |B| . The problem to find a schedule that minimizes only 
the changeover costs is equivalent to the asymmetric Shortest Hamiltonian Path 
problem which is NP-complete in the strong sense. 



3 Ant Algorithms for Single-Criteria Optimization 
Problems 



3.1 Total Tardiness Minimization 



A variant of the AGO algorithm of Merkle and Middendorf [lOj for the SMTTP 
is described in this section (other AGO approaches for SMTTP can be found in 
m)- In every generation each of m ants constructs one solution. An ant selects 
the jobs in the order in which they will appear in the schedule. For the selection 
of a job the ant uses heuristic information as well as pheromone information. The 
heuristic information, denoted by rjij, and the pheromone information, denoted 
by Tij, are an indicator of how good it seems to have job j at place i of the 
schedule. The heuristic value is generated by some problem dependent heuristic 
whereas the pheromone information stems from former ants that have found 
good solutions. 

The next job is chosen from the set S of jobs that have not been scheduled 
so far according to the probability distribution that is determined by 



Pi] = 



[Vijf 



( 1 ) 



The heuristic values ijij are computed according the following modified due 
date rule 



max{T -I- pj ,dj} — T 

where T is the total processing time of all jobs already scheduled. The best 
solution found so far is then used to update the pheromone matrix. But before 
the update is done some of the old pheromone is evaporated according to 

'^ij ~ (1 P) ‘ '^ij 

The reason for this is that old pheromone should not have a too strong influ- 
ence on the future. Then, for every job j in the schedule of the best solution found 
so far some amount of pheromone A is added to element Tij of the pheromone 
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matrix where i is the place of job j in the schedule. The algorithm stops when 
some stopping criterion is met, e.g. a certain number of generations has been 
done. The pseudocode for the algorithm is given below. 



Ant Algorithm 1 Total Tardiness Minimization 

repeat 

for ant fee m} do 

S = {1, 2, . . . , n} {set of nonscheduled jobs} 
for i = 1 to n do 

choose job j G S with probability pij 

s-.^s-{j} 

end for 
end for 
for all (i,j) do 

Tij {1 — p) ■ Tij {evaporate pheromone} 

end for 

for all (i,j) e best solution do 

Tij <= Tij + A {update pheromone} 

end for 

until stopping criterion is met 



It has to be mentioned that we use in this paper a simplified version of 
the original ant algorithm for the SMTTP problem. For an example no local 
pheromone update is done by the ants, i.e. the ants do not change the pheromone 
values during their search for a solution. Moreover, we do not use local optimiza- 
tion. The reason for using a simplified variant is that we want to concentrate on 
the algorithmic aspects that are relevant for solving a bi-criterion problem with 
multiple colonies. 



3.2 Changeover Cost Minimization 

An ant algorithm for the single machine scheduling problem where only the 
changeover costs are relevant could be similar to the algorithm described above. 
But the pheromone information should be different. For the total tardiness 
value it is important on which place in the schedule a job is. Whereas, for the 
changeover costs it is more important which job is the predecessor of a job. 
Hence, a pheromone matrix is used where is the desirability that job j comes 
after job i in the schedule. Moreover, an additional dummy node 0 is introduced 
so that c(0, j) = 0 for every job j € [l,n]. Then toj is the desirability to start 
with job j. As heuristic information we use pij = l/c(i,j) for i,j € [l,n] and 
rjoj = 1 for j G [l,n]. 
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4 Ant Algorithm for Bi-Criterion Optimization Problems 

In this section we describe how our ant algorithms works in case of a single 
colony of ants. 



4.1 Two Pheromone Matrices 

Different optimization criteria may need different pheromone information as ex- 
plained the last section. Therefore we propose to use two pheromone matrices, 
one for each criterion. Then every ant can work differently on the two matrices. 

For the SMTTP with changeover costs we use a pheromone matrix M = (r^ j ) 
for the total tardiness criterion where is the desirability that job j is on place i 
of the schedule. For the changeover cost criterion a pheromone matrix M' = (rb ) 
is used where r/ ■ is the desirability that job j comes after job i in the schedule. 



4.2 Heterogeneous Colony 



To force the ants to search in different regions of the Pareto front each of the 
m ants in the colony weights the relative importance of the two optimization 
criteria differently when making its decisions. More exactly, ant k, k G [l,rn] 
in the colony uses \k = Every ant makes its decision according to the 

following probabilities: 



Py = 



\a 

ij 



i(l-X)a Xf3 /(l-A)/3 

r - ■ • 7? • 77 - ■ 

IJ 'y 'n 



\ ' q-Xa 

ih 



‘_/'(l‘-A)a ^ A/3 , M(1-A)/3 

'ih 'kh 'lih 



(3) 



where rjij is the heuristic information that corresponds to the tardiness cri- 
terion (cmp. Equation Ej) and 77 F is the heuristic information that corresponds 
to the changeover criterion (cmp. Subsection 13.211 . Thus, in the extreme cases 
the ant m with A = 1 considers only the first criterion whereas ant 1 with A = 0 
considers only the second criterion. 

Merkle and Middendorf m proposed an alternative method for pheromone 
evaluation where the pheromone values corresponding to older decisions are 
taken into account. Instead of using in formulanthey used X)fc=i ''’fci- This so 
called summation evaluation method was successfully applied to several schedul- 
ing problems. For the weighted version of SMTTP a combination of standard 
evaluation and a weighted version of summation evaluation has been shown to 
be very successful in HH. Formally, this combined version uses 



T* :=C-Xi- T,j -k (1 - C) • 7/* • ^ 7 * (4) 

k=l 

where c is the parameter that determines the relative influence of weighted 
summation evaluation, 7 is the parameter that determines the relative influ- 
ence of pheromone values corresponding to older decisions. Parameters Xi := 
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J2k=l 7* and yi := '''ih are factors to adjust the relative influ- 

ence of local and summation evaluation. Observe, that for c = 1 the standard 
evaluation is obtained and for c = 0 pure summation evaluation. 

Therefore it might be advantageous to use different methods for the evalua- 
tion of the two pheromone matrices. If e.g. summation evaluation is used for the 
first criterion the probabilities used by the ant are 



P^J = 






\a '( 1 -^)“ 



A/3 



. '(l-A)/3 
'hj 



Xa . '(l-A)a 



• r. 



ih 



A/3 /(l-A)/3 

Vih ■ v^h 



( 5 ) 



4.3 Pheromone Update 

When all m ants of a generation have found a solution it has to be decided 
which of the ants are allowed to update. Here we propose that all ants in the 
nondominated front of the actual generation are allowed to update. An ant 
that updates will update both pheromone matrices M and M' . Note, that this 
rule makes sense only when there are not too few ants in a colony. With only 
very few ants in a colony the ants differ much by their A- values. Hence, no real 
competition about best solutions occurs since the ants will search in different 
regions of the nondominated front. Thus even the ants with weak solutions will 
have good changes to do an update. One way to solve this problem is to allow 
only those ants to update that have found solutions which are good compared 
to the nondominated front of all solutions that have been found so far. 

To give every generation of ants the same influence the amount of pheromone 
that is added to a pheromone matrix is the same in every generation. Therefore, 
every ant is allowed to update an amount of Tij = Tij + 1/1 where I is the number 
of ants that are allowed to update in the actual generation. 



5 The Multi Colony Approach 

In our multi colony ant algorithm several colonies of ants cooperate and specialize 
to And good solutions in different regions of the Pareto front. All p colonies have 
the same number of m/p ants. 

5.1 Pheromone Update 

In the single colony algorithm only ants in the nondominated front of the colony 
are allowed to update. Hence, a reasonable way for pheromone update in the 
multi colony algorithm is that only those ants update that found a solution 
which is in the local nondominated front of the colony. This corresponds to the 
case were there is no cooperation between the colonies. Therefore the results are 
the same as with a multistart approach where a single colony ant algorithm is run 
several times and the global nondominated front at the end is determined from 
the nondominated fronts of all runs. In the following we describe how to introduce 
collaboration between the colonies. For this, the ants in a generation put their 




Bi-Criterion Optimization with Multi Colony Ant Algorithms 365 



solutions in a global solution pool that is shared by all colonies. The pool is used 
to determine the nondominated front of all solutions in that generation. Then, 
only ants that found a solution which is in the global nondominated front are 
allowed to update. We study two different methods to determine in which colony 
an ant should update the pheromone matrix: 

1. Method 1 - update by origin: an ant updates only in its own colony (compare 
Figure [U . 

2. Method 2 - update by region in the nondominated front: the sequence of 
solutions along the nondominated front is split into p parts of equal size. 
Ants that have found solutions in the ith part update in colony i, i G [l,p] 
(compare Figurel^. More formally, the solutions in the nondominated front 
are sorted with respect to the first criterion (it does not matter whether the 
list is sorted according to the first or the second criterion). Let L be the 
sorted list. The sorted list is then split into parts Li, L 2 , ... ,Lp so that their 
size differs by at most one. All ants that found solutions in list Li, i G [l,p] 
will update the pheromone matrix of colony i. 

The first method imposes a stronger selection pressure on the ants that are 
allowed to update. It is not enough for an ant to have a solution in the local 
nondominated front of its colony. Instead, the solution must be in the global 
nondominated front. This method might be advantageous because other colonies 
help to detect which of the solutions in the local nondominated front of a colony 
might be weak. An interesting observation is that the update by origin method 
might also enforce the colonies to search in different regions of the nondominated 
front. It is more likely that a solution from the local nondominated front of a 
colony might also be in the global nondominated front when only a few solutions 
from other colonies are in the same region. Hence, it is more likely that an ant 
with solutions in less dense areas of the nondominated front will be allowed to 
update and thereby will influence the further search process. 

The aim of method 2 is to explicitly guide the ant colonies to search in 
different regions of the Pareto front. 



5.2 Heterogeneous Colonies 

As in the single colony algorithm the ants in a colony use different A-values, 
i.e. when making their decisions they weight the relative importance of the two 
optimization criteria differently. More exactly, ant k in colony i, i G uses 
Afc = k G [l,m/p]. We call this rule 1 for defining the A-values. 

An alternative could be to use different A-values in the colonies so that A- 
values of the ants in the colonies are in different subintervalls of [0, 1]. Thus the 
colonies weight the optimization criteria differently. 

— Rule 2 — disjoint A-intervalls: ant k, k G [l,m/p] in colony i has A- value 
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Fig. 1. Update by origin: Every ant with a solution in the nondominated front updates 
in its own colony. 




Fig. 2. Update by region in the nondominated front: Ants with a solution in the 
nondominated front update in the colony that are corresponds to the region of the 
solution. 



— Rule 3 — overlapping A-intervalls: the A-intervall of colony i overlaps by 50% 
with the A-intervall of colony i — 1 and colony i -\- 1. Formally, colony i has 
ants with A- values in [(i — l)/{p + l),{i+ l)/{p + 1)] (compare Figure Ell- 



Colony 1 



Colony 2 



Colony 3 



Colony 4 



0.0 



0.5 



1.0 



Fig. 3. A-values when using rule 3: 4 colonies with 7 ants each. 
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6 Test Instances and Parameters 



We tested our ant algorithms on problem instances where the jobs and their 
deadlines were generated after the following rule that is often used to create 
instances for the SMTTP 0: for each job j G [1, 100] an integer processing time 
Pj is taken randomly from the interval [1,100] and an integer due date dj is 
taken randomly from the interval 



100 



Y^Pj-a-TF 



HDD 

2 



100 



),^p,-(l-TF + 
1=1 



ROD 
2 ’ 



The value RDD (relative range of due dates) determines the length of the 
interval from which the due dates were taken. TF (tardiness factor) determines 
the relative position of the centre of this interval between 0 and J2j=iPj- 
chose the values for TF from the set {0.4, 0.5, 0.6} and RDD was set to RDD = 
0.6. The changeover costs between the jobs were chosen randomly from one of 
the sets [1,100] and [50,100]. 

The parameters used for the test runs are: a = 1, /3 = 1, p = 0.02. Pheromone 
evaluation was done according to formula 0 where a combination between sum- 
mation evaluation and standard evaluation for matrix 1 is used. For the corre- 
sponding parameters the values c = 0.6 and 7 = 0.9 were used (these values 
were shown to be suitable for the weighted SMTTP mi)- The number of ants 
in every generation was m = 100. When using several colonies the 100 ants were 
distributed equally to the colonies. Every element of the pheromone matrices 
was initialized with 1.0. Every test was performed with 11 runs. Every run was 
stopped after 300 generations. 



7 Results 

The performance of the multi colony approach was tested on 6 problem in- 
stances: three instances with changeover costs in [1,100] and three instances 
with changeover costs in [50,100] (and TF G (0.4, 0.5, 0.6}). The outcome of 
each single run of the ant algorithm is the subset of all nondominated solutions 
in the set of all solutions found during the run of the algorithm. The median 
attainment surfaces for runs with 1 colony and 10 colonies are shown in Figure 
El (the median attainment surface is the median line of all the attainment sur- 
faces connecting the pareto front in every of the 11 runs). For the tests with 10 
colonies we used pheromone update method 2 (update by region in the pareto 
front) and A-rule 2 with overlapping A-inter vails. The figure shows that the me- 
dian attainment surfaces of the runs with 10 colonies are nearly always better 
than those for the 1 colony runs. Only for two instances with changeover costs in 
[50, 100] the median attainment surfaces of the 1 colony runs are slightly better 
in a small region. But for these instances the 10 colonies found solutions with 
much smaller changeover costs. 
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CHANGEOVER COSTS 



Fig. 4. Median attainment surfaces obtained with 1 colony and 10 colonies. Left 
part: three instances with changeover costs in [1,100]. Right part: three instances with 
changeover costs in [50,100]. 



In the following we present some results where we compare the different 
pheromone update methods and the A-rules. For these tests we used the problem 
instance with TF = 0.6 and changeover costs in [1, 100]. 

Figure!^ shows the convergence behaviour of the 10 colonies algorithm. The 
median attainment surfaces obtained after different numbers of generations are 
depicted. It can be seen that the median attainment surfaces after 200 gener- 
ations and 300 generations differ not much. The results obtained for 1 colony 
were similar. Hence, our other results that were all obtained after 300 generations 
should not change with a higher number of generations. 

The median attainment surfaces obtained with the different pheromone up- 
date methods are shown in Figure El The simple multistart strategy without 
cooperation between the colonies is worst along the whole surfaces. For the 
smaller costs values method 1 (update by origin) and method 2 (update by re- 
gion in the nondominated front) show nearly the same performance. But median 
attainment surface when using method 2 is the best for medium and small total 
tardiness values. 

Figure 0 shows from which colony the ants stem that found solutions which 
are in the final nondominated front (i.e. after 300 generations). It can be seen 
that the pheromone update method 2 (by region in the nondominated front) 
forces all colonies to specialize to those regions from where the ants come that 
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Fig. 5. Convergence behavionr with 10 colonies: median attainment surfaces are shown 
for generations 10, 50, 100, 200 and 300. 




Fig. 6. Influence of the pheromone npdate method when nsing 10 colonies: method 
1 — npdate by origin; method 2 — update by cutting the nondominated front; the 
multistart approach. 



are allowed to update in that colony. Also, method 1 (update by origin) seems 
to force the colonies to specialize, though this effect is not so clear as for method 
2. Clearly, for method 1 it can not be predicted to which region a colony will 
specialize. 

The influence of the A-rules when using pheromone update method 1 (by 
origin) is shown in Figure 0 for 2 and 5 colonies. When using rule 2 with dis- 
joint A-inter vails the case of 2 colonies clearly shows that colonies are forced to 
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Fig. 7. Solutions in the nondominated front after 300 generations when using 10 
colonies: changeover costs and number of the colony from which the corresponding ant 
stems. Left: pheromone update by region in the nondominated front (for all 11 runs). 
Right: pheromone update by origin (only for 1 run). 



specialize to different regions of the nondominated front. The results for rule 
2 in centres of these regions is quite good and better than for rule 3. But in 
the middle of the nondominated front and in the extreme regions the results 
for rule 2 are poor. Rule 3 shows a much more balanced behaviour. Rule 1 (all 
colonies have ants with A- values in [0, 1]) perfomed worse than rule 3 along the 
whole median attainment surface (not shown in the figure). The results for 5 
colonies are similar but the differences between the rules are not so big since the 
colonies can specialize to smaller regions. For 10 colonies only small differences 
were found (not shown here). 




CHANGEOVER COSTS 




CHANGEOVER COSTS 



Fig. 8. Influence of the A-rule when using 2 and 5 colonies: rule 2 — disjoint intervalls; 
rule 3 — overlapping intervalls. 



Finally, we show that it is advantageous to use two pheromone matrices per 
colony (compare subsection H. Ill : a place-job matrix suitable for minimizing the 
total tardiness and a job-job matrix suitable for minimizing changeover costs 
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(see Figure EJ. When using only the place-job matrix in every colony the total 
tardiness values are good when changeover costs do not matter. But the ants 
were not able to find solutions with small changeover costs. When using only 
the job-job matrix in each colony we have the opposite effect: the ants can find 
solutions with small changeover costs but no solutions with small total tardiness 
values. Using two matrices per colony performed nearly always better along the 
whole median attainment surface than using only on of the matrices. 




CHANGEOVER COSTS 

Fig. 9. Influence of number of matrices per colony: only one place-job matrix, only one 
job-job matrix, one place-job matrix and one job-job matrix. Test runs were performed 
with 10 colonies, pheromone by region in the nondominated front and overlapping 
A-intervalls. 



8 Conclusions and Future Work 

We studied an approach to solve bi-criterion optimization problems with a mul- 
tiple colony ant algorithm. It was shown that cooperation between the colonies 
allows to find good solutions along the whole the Pareto front. Heterogeneous 
colonies were used where the ants have different prefences. It was shown that the 
use of two different kinds of pheromone matrices in every colony — each matrix 
suitable for one optimization criterium — is advantagouos. 

Currently we are studying methods to dynamically adapt the A-values of the 
ants in a colony instead of using predetermined A-intervalls. Moreover, the in- 
troduction of fitness sharing is under investigation. We expect that the use of 
elitist ants that update the pheromone matrices for solutions that are in the non- 
dominated front of all solutions found so far will improve our results. Also, local 
pheromone update might be especially useful for multi-criterion optimization 
since it forces the ants to search for different sulutions. 
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An interesting topic for future research will be to study how the pheromone 
update by region in the nondominated front can be applied to optimization 
problems with more than two criteria. It is not obvious what regions should be 
used then in the nondominated front. 
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Abstract. In this paper, three evolutionary algorithms are discussed for solving 
three-criteria optimisation problem of finding a set of Pareto-optimal task as- 
signments. Finally, the algorithm with a tahu mutation is recommended for 
solving an established multiobjective optimisation dilemma. Some numerical 
results are submitted. 



1 Introduction 

An adaptive evolutionary algorithm and an adaptive evolution strategy are important 
techniques for solving multiobjective optimisation problems related with task assign- 
ment that minimize a workload of a bottleneck computer and the cost of machines [1]. 
From the other hand, a tabu search is the powerful meta-heuristic approach, which has 
been applied for crucial applications in engineering, economics and science [4]. We 
propose a new version of a multicriteria evolutionary algorithm with a tabu search as 
an advanced mutation operation. 

Finding allocations of program modules is a major design problem for distributed 
computer systems [5]. Program component allotments may diminish the total time of 
a program execution by taking a benefit of the particular properties of some worksta- 
tions or an advantage of the computer load. Three criteria are utilized for a quality 
evaluation of the module allocation; a processing load of the bottleneck machine, the 
cost of computers, and the total numerical performance of workstations. 

The efficient network flow algorithm has been applied for the minimization of the 
program execution cost in a two-computer system [23]. If the number of computers is 
greater than 3 or the memory in a computer is constrained, then a problem of the pro- 
gram completion cost minimization by task dispersion is NP-hard [14]. If a tree or the 
parallel-sequence graph represents the structure of the intermodule communication, 
then efficient algorithms based on the shortest path procedure can be developed for 
finding an optimal solution [5]. Task assignment problems are related with scheduling 
questions [25], however there are several important distinguishes between them. In 
task assignment questions we use the other evaluation criteria, and unlike constraints 
are respected. Although a task assignment model is less complex, the most common 
used problems are NP-hard. 
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Some parameters can be applied for a measurement of the module allotment qual- 
ity. If above optimisation goals are conflicted, then a task assignment question can be 
formulated as a multiobjective optimisation problem. The problem of finding the 
optimal program module assignments was discussed in [2], where the total computer 
cost is in conflict with the time of a distributed sequential program performance. 



2 Model of Distributed Processing 



A program module can be activated several times during the program lifetime. With 
the program module performing are associated some processes (tasks). In results, a set 
of program modules {M,,...,M^,...,M„} is mapped into a set of tasks {7'j,...,r,...,7\,}. 

Let be assumed the task T is executed on several sorts of computers taken from 

the set n = j } . Workstations are bestowed to the nodes from the 

set W = } . In the node vv_ the computer is chosen from the set of 

computer sorts II . 

We assume one and only one computer is allocated to each node. It implies the 
computer allocation constraints, as follows: 

J 

= ( 1 ) 

7=1 

jj. J 1 if n j is assigned to the W; , 
where — j q tjje other case. 

There are I equations for each node number i that belongs to the set {1,...,/}. Be- 
cause each unit is allocated to one node, then the task constraints are devised, as be- 
low: 

( 2 ) 

i=\ 

m r 1 if task T., is assigned to w,- , 
w ere | q jjjg other case. 

The underside vector defines the assignment of tasks to computers: 



x = [x 
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( 3 ) 



Restrictions (1) and (2) lessen the quantity of allotments x from to I '^j' . 
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3 Valuations of Task Assignments 

The cost of the parallel program performing is the most common used measure of an 
allowance evaluation [5, 23]. Another measure is a load of the bottleneck computer 
[6, 18], 

The workload Z*{x) of a computer allotted to the ith node for the allocation x is 
provided by the subsequent formula; 

z* w = + y.y.t. r,xx7 . (4) 

j=\ V=1 V=1 M=1 ^2=1 

u^v 

where 

tyj - the overhead performing time of the task T by the computer 

- the total communication time between the task T and the task T . 



A computer with the heaviest load Z*(x) is the bottleneck machine in the system, 
and its workload is the critical value that should be minimized. 

The weight of the bottleneck computer is analysed, as below: 

^max (-^) = max {z;(x)} (5) 

1 ,/ 

The other measure of the task assignment is a cost of computers that can be cal- 
culated according to the ensuing formula: 

/ J 
/=! j=\ 

where K. corresponds to the cost of the computer 7C-. 

The third measure of the task assignment is a total amount of computer perform- 
ance that can be deliberated according to the following formula: 

i=\ i=\ 

where represents the numerical performance of the computer n. for assumed 
benchmark. 



4 Formulation of Multi-criterion Optimisation Problem 

An optimal task allocation for the cost of the parallel program performing does not 
swear the load stability on computers in some assignments, because the workstation 
with the heaviest load might boast a heavier consignment than another bottleneck 
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machine for the other task allocation in a distributed system. The workload of the 
bottleneck computer can be employed as an assessment quantify of an allotment 
quality in systems, where the minimization of a response time is required, too [6]. 

Let (X, F, P) be the multi-criterion optimisation question for finding the represen- 
tation of Pareto-optimal solutions. It can be established, as follows: 



1) X - an admissible solution set 

X ={xgB =1,/ = U} 

i=l 7=1 

where B = { 0, 1 } 

2) F - a vector superiority criterion 

F:X ( 8 ) 

where 

R-the set of real numbers, 

F(x) = [ZmaxW’ ■^2^’ ^2 (■*) ] ^ 

ZmaxW calculated by (5), 

F 2 (x) is calculated by (6), 

(x) is calculated by (7) 

3) P - the Pareto relationship [2, 7]. 



The relationship P is a subset of the product YxY, where an evaluation set Y=F(X). 
If elements ae Y, be Y, and there are a„ <bf^,n = l,N , then the pair of evaluations 

(a,b)eP. The meaning of the Pareto relationship respects the minimization of all crite- 
ria. There is no task allocation aeX such that (P(a),P(x*))G P for the Pareto-optimal 
assignment x*g X, 



5 Adaptive Genetic Algorithm 

Genetic algorithms GA have been applied for solving distinguishes optimisation 
problems [17]. The vector evaluated genetic algorithm VEGA has been considered for 
solving multi-criterion optimisation questions [19]. 

The selection with hierarchical tournaments have been considered [10], where two 
randomly chosen solutions have been compared. The hierarchical alternative is cho- 
sen and it is included to a mating pool of likely parents. A selection probability is 
calculated for the most considerable aim. A casual choice is carried out twice ac- 
cording to the roulette rule. 
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A ranking system for non-dominated individuals has been introduced to avoid the 
prejudice of the interior Pareto alternatives [12]. It has been extended by Srinivas and 
Deb [22]. If some admissible solutions are in a population, then the Pareto-optimal 
individuals are determined, and after that they get the rank 1. Afterwards, they are 
temporary removed from the population. After that, the recent Pareto-optimal alterna- 
tives are found from the reduced population and they get the rank 2. The level is in- 
creased and the procedure is repeated until the set of admissible solutions is ex- 
hausted. All non-dominated individuals have the same reproduction fitness because of 
the comparable rank. The Goldberg’s ranking is based on dominance layers. 

The fitness for a non-feasible solution is equal to the difference between the 
maximal penalty P„iax in a population and the solution penalty. If x is admissible, then 
the fitness function value is estimated, as below: 

f(x) = Pm^^-r(x) + L+l, (9) 

where r(x) denotes the rank of an admissible solution. 

Another ranking procedure has been introduced by Fonseca and Fleming [8]. It as- 
signs each individual a rank based on the number of other individuals by which it is 
dominated. A niching procedure modifies these ranks. The surface region of the Pa- 
reto front is divided by the size of the population. The number of other member’s 
falling within the sub-area of any individual is taken to establish the niching penalty 
fork [9]. 

Sheble and Britting have observed the quality of attained solutions increases in 
optimisation problems with one criterion, if the crossover probability and the muta- 
tion rate is changed in an adaptive way [21]. Let this approach be launched to 
a multicriteria genetic algorithm with ranking procedure (Fig. 1). 

A proposed adaptive multiobjective genetic algorithm AMGA may be applied for 
solving a spacious class of multi-criterion optimisation problems. Binary vectors 
represent solutions in genetic algorithms and it is a crucial constraint. 

Let it be discussed the adaptive changing of a crossover probability and 
a mutation rate p^. At the initial population the crossover probability is 1 and each 
pair of potential parents is obligatory taken for crossover operation performing. 
A crossover operation supports the finding of a high-quality solution area in the 
search space. It is important in the early search stage. If the number of generation t 
increases, then the crossover probability decreases, according to the formula 
—t/T 

P(.= e where is a maximal number of generations. 

A mutation rate is 0 at the initial generation. It is an operation that can support 
finding of local optimal solution in the areas determined by individuals in the popula- 
tion. A mutation combines with crossover procedure to focus the searching towards 
global optimum. The value of p^ increases with respect to the formula 

Fm = ’ exponentially. In the final population 5.13% bits are chosen to 

a bit mutation. 

In this paper, we understand that using functions for mutation and crossover 
makes evolutionary algorithms adaptive, however, according to another meaning it is 
related with the operators change as a result of the search process (i.e., population 
diversity, etc), not as a function of generation. 
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6 Level of Convergence to Pareto Front 



The AMGA with a few modifications is able to find task assignment representation 
for several multiobjective optimisation problems. Simulation results corroborate that 
the AMGA is capable for finding the set of Pareto-suboptimal solutions for the ques- 
tion ( 8 ). 



BEGIN 

t:=0, set the even size of population L 

generate randomly initial population P(t) 

calculate ranks r(x) and fitness / (x), x e Pit) 

/iMu/!:=FALSE 

WHILE finish DO 

BEGIN /* new population */ 

t:=t+l, P{t):=0 

calculate selection probabilities p^(x),xsP(t—T) 
FOR L/2 DO 

BEGIN /* reproduction cycle */ 



Pc 



’ Pm ~ ^ 



0.05t/Trr 



• proportional selection of a potential parent pair fa,b) from the population R(r-l) 

• simple crossover of a parent pair fa,b) with the adaptive crossover probability 

• bit mutation of an offspring pair (a’,b’) with the adaptive mutation probability 

• P(f):=R(0u(a’,b’} 

END 

calculate ranks r(x) and fitness / (x), X£ P{t) 

IF (P(t) converges OR THEN/inij'/!:=TRUE 

END 
END 



Fig. 1. An adaptive multicriteria genetic algorithm AMGA 

Let the Pareto points {P,, P^,..., P„} be given for any instance of the task 
assignment problem ( 8 ). The level of convergence to the Pareto front can measure the 
quality of obtained set of solutions. If the AMGA finds the efficient point 
(A_|, P^2> for the cost of computers then there is the nth Pareto result 
{P^^, P^2> Pu^ with the same cost of computers. 

The distance between points (A„j, P„j, A^j) and (P^^, P_j, P^fi is calculated according 

to +fe-A,3)" • If the point (A„„ P^2> is not discovered by the 

algorithm, then we assume the distance is (p^j — An” ) + iPui ~ Als” ) > where 
is the maximum load of the bottleneck computer for the instance of problem 
( 8 ), and is the minimum performance of computers for the instance of question 

(8). 
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The level of convergence to the Pareto front is calculated, as follows: 

= S (10) 

u=l 

An average level S is calculated for several runs of the evolutionary algorithm. 



7 Tabu Search as Mutation Operation 



A survey of the state-of-the-art for multi-criterion evolutionary algorithms is submit- 
ted in [7, 8, 24, 26]. Let the adaptive multicriteria evolutionary algorithm AMEA be 
a base for creating the AMEA with tabu mutation. A flow scheme of the AMEA is 
alike to the AMGA, but the preliminary population is erected in a specific manner. 
Individuals are constructed to satisfy constraints (1) and (2) by introducing integer 
representation of chromosomes, as follows: 






( 11 ) 



where X^—iforx^—1 and Xf^ = Jforx^=l. Besides, 1<X^<I and 
1 < Xf < J. 

The crossover point is randomly chosen between neighbour genes in the chromo- 
some X. A bit mutation is carried out through the random swap of the integer value by 

another one from a feasible discrete set. If the gene is randomly taken for muta- 



tion, then the positive integer value is taken from the set {1,...,/}. If the gene is 
randomly chosen, then the value is selected from the set {1,..., / } . 

The adaptive multicriteria evolutionary algorithm with a tabu mutation is a hybrid 
optimisation technique that combines advantages of a genetic search with a tabu 
search. In a tabu search special areas are forbidden during the seeking in a space of 
all possible combinations. 

Tabu search algorithms have been applied for solving several optimisation prob- 
lems in scheduling, computer-aided design, quadratic assignment, training and de- 
signing of neural networks [11]. Moreover, the best results have been obtained by 
tabu search algorithms in telecommunication call routing, volume discount acquisi- 
tion in production, and vehicle routing. Its good capabilities have been confirmed 
during solving standard optimisation problems such as graph partitioning, graph col- 
ouring, clique partitioning. Tabu search can be treated as a general combinatorial 
optimisation technique for using in zero-one programming, nonconvex non-linear 
programming, and general mixed integer optimisation. 

Tabu search uses memory structures by reference to four principal dimensions, 
consisting of recency, frequency, quality and influence [11]. Tabu search algorithm 
inherits from a simple descent method an idea of a neighbourhood A(x”™) of a current 
solution x"°”. From this neighbourhood we can choose the next solution x““ to a search 
trajectory. The accepted alternative should have the best value of an objective 
function among the current neighbourhood. But, the descent method terminates its 
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searching, when the chosen candidate is worse than the best one from the searching 
trajectory. 

In the tabu search algorithm based on the short-term memory, a basic neighbour- 
hood of a current solution may be reduced to a considered neighbourhood 

because of the maintaining a selective history of the states encountered during 
the exploration. Some solutions, which were visited during the given last term, are 
excluded from the basic neighbourhood according to the tabu classification of move- 
ments. If any solutions performs aspiration criterion, then it can be included to the 
considered neighbourhood, only. 

A recency-based memory keeps track of solutions attributes that have changed 
during the recent past. Selected attributes that occur in solutions lately visited are 
branded tabu-active. Solutions with tabu-active attributes or with some combinations 
of these attributes became tabu, too. This prevents some solutions included to the 
recent part of a trajectory from belonging to a considered neighbourhood and hence 
from being revisited. Furthermore, other solutions with tabu-active attributes are 
similarly prevented being visited. While the tabu classification strictly refers to solu- 
tions that are forbidden to be visited, we also often refer to moves that lead to such 
solutions as being tabu [11]. 

In the tabu search algorithm based on the long-term memory, a considered neigh- 
bourhood may also be expanded to incorporate solutions not regular found in 
an essential neighbourhood. During stretched exhausting searching, there is an op- 
portunity to count frequency measures of selected attributes. Often performed move- 
ments should be inviolable to take an ability rarely performed actions after long ex- 
amination. Frequency measures of selected attributes are respected in the selecting 
function of a next solution from a current neighbourhood. 

Hansen has proposed a multiobjective optimisation tabu search MOTS [13] to 
generate non-dominated alternatives. The MOTS works with a population of solu- 
tions, which, through manipulation of weights, are moved towards the Pareto front 
[13]. But, the MOTS do not cooperate with an evolutionary algorithm. 

We propose the following approach. An integer mutation is substituted through 

the tabu search for randomly chosen chromosome ). A fitness of this 

chromosome can be calculated according to the formula (9). The tabu search tech- 
nique increases an initial value of fitness f(X!^,X^) to the final value 

( X^ , X^ ). It usually calculates better outcome in the fitness sense than initial 

level. A tabu search algorithm results an additional complexity 0(n). 

Better outcomes from the tabu mutation are transformed into improving of solu- 
tion quality obtained by the adaptive multicriteria evolutionary algorithm with tabu 
mutation AMEA-t. This adaptive evolutionary algorithm gives better results than the 
AMEA and much better than the AMGA (Eig. 2). After 200 generations, an average 
level of Pareto set obtaining is 0.8% for the AMEA-I-, 1.3% for the AMEA, and 43% 
for the AMGA. 30 test preliminary populations were prepared, and each algorithm 
starts 30 times from these populations. 

Eor integer constrained coding of chromosomes there are 12 decision variables in 
the test optimisation problem. The search space consists of 25 600 solutions, only. 
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Fig. 2. Outcome convergence for the AMEA+ and the AMEA 



8 Evolution Strategy 

An evolution strategy [20] for a multi-criterion optimisation was introduced by Kur- 
sawe [16]. This class of optimisation techniques has been discussed and extended in 
[3] and [15]. We propose a chromosome in the multicriteria evolution strategy MBS 
for problem (8) that consists of two main parts: 

X=(X,C7), (12) 

where 

X - the integer decision variable vector given by (1 1), 

(T - the standard deviation vector for X. 

The novel population is created from the ju individuals in the existing generation 
by 3 steps. In the step 1, A individuals are randomly chosen from the current popula- 
tion to the temporary parent set. The selection rate is constant for each individual 
from the current population. Number of parents A is even. So, the proposed algorithm 
is a version (jU+A) of an evolution strategy [16]. 

In the next step, the crossover operator is carried out hy the gene recombination 
between randomly selected two task assignments from the temporary parent set. Each 
mth gene is taken as the mth gene from the parent A or as the mth gene from the par- 
ent B. 

The temporary offspring set is transformed by the strategic mutation operation, in 
the last step. It changes a value of each decision variable of every offspring by 
adding the random value Ax^ that represents a random variable with a normal distri- 
bution Ax^ is rounded to the integer number. After that, the standard devia- 
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tions are changed by the analogous way. An extended set of individuals from the 
previous population and X mutated offspring is pointed to the // individuals by an elite 
selection according to the values of a fitness function. 

A selection to the temporary parent set, the gene recombination, and the strategic 
mutation is repeated until the maximal number of generations is exceeded. Experi- 
ments with the MES have shown comparable qualities of solutions with the adaptive 
evolutionary algorithm. For the benchmark optimisation problem, this evolutionary 
strategy gives a bit worse products than the AMEA-t (Fig. 3). 




Fig. 3. Minimization of the average level by the MES and the AMEA-l- 

The other experimental results confirm the closeness of the outcome quality for 
the multi-criterion evolution strategy and the adaptive multicriteria evolutionary algo- 
rithm with tabu mutation. For the benchmark instance, a maximal level of conver- 
gence to Pareto set was 5.6% for the MES versus 4.2% for the AMEA-t. The average 
number of proper optimal solutions was 48.4% for the MES and 53.6% for the 
AMEA-t. 

For the instance with 15 tasks, 4 nodes, and 5 computer types there are 80 binary 
decision variables. An average level of convergence to the Pareto set is 15.7% for the 
MES and 16.4% for the AMEA-t. A maximal level is 27.5% for the MES and 25.6% 
for the AMEA-t. For this instance the average number of optimal solutions is 16.5% 
for the MES and 17.1% for the AMEA-t. 

An average level of convergence to the Pareto set, an maximal level, and the aver- 
age number of optimal solutions become worse, when the number of task, number of 
nodes, and number of computer types increase. An average level is 34.6% for the 
MES versus 33,7% for the AMEA-t, if the instance includes 50 tasks, 4 nodes, 5 com- 
puter types, and 220 binary decision variables. 
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9 Concluding Remarks 

The adaptive evolutionary algorithm with a tahu mutation AMEA+ is a proposed 
technique for finding Pareto-optimal task allocations in three-objective optimisation 
problem with the minimisation of a workload of the bottleneck computer, the minimi- 
sation of the computers cost and the maximisation of distributed system performance. 

Tabu search algorithms like the MOTS can be used for the local improving of 
non-dominated solution in population. However, it can improve an offspring accord- 
ing to one criterion, too. Another strategy deals with finding an initial population by 
AMEA for a multicriteria tabu search algorithm. Our future works will concern on 
a development the combination between tabu search and evolutionary algorithms for 
finding Pareto-optimal solutions. 
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Abstract. Evolutionary optimization algorithms work with a population of so- 
lutions, instead of a single solution. Since multi-objective optimization problems 
give rise to a set of Pareto-optimal solutions, evolutionary optimization algo- 
rithms are ideal for handling multi-objective optimization problems. Over many 
years of research and application studies have produced a number of efficient 
multi-objective evolutionary algorithms (MOEAs), which are ready to be applied 
to real-world problems. In this paper, we propose a practical approach, which will 
enable an user to move closer to the true Pareto-optimal front and simultaneously 
reduce the size of the obtained non-dominated solution set. The efficacy of the 
proposed approach is demonstrated in solving a number of mechanical shape op- 
timization problems, including a simply-supported plate design, a cantilever plate 
design, a hoister design, and a bicycle frame design. The results are interesting 
and suggest immediate application of the proposed technique in more complex 
engineering design problems. 



1 Introduction 



For last decade or so, a number of multi-objective optimization techniques using evolu- 
tionary algorithms are suggested l3KilIUII4IIblT/l . The outcome of these studies is that 
different multi-objective optimization problems are possible to solve for the purpose of 
finding multiple Pareto-optimal solutions in one single simulation run. Classical means 
of finding one solution at a time with a weight vector or with a similar approach requires 
a priori knowledge of weight vector and need to be run many times, hopefully finding 
a different Pareto-optimal solution each time. In addition to converging close or on the 
true Pareto-optimal set, multi-objective evolutionary algorithms (MOEAs) are capable 
to finding a widely distributed set of solutions. 

In this paper, we suggest a hybrid technique to take evolutionary multi-objective 
optimization procedures one step closer to practice. Specifically, in a real-world problem, 
we would like to ensure a better convergence to the true Pareto-optimal front and would 
also like to reduce the size of obtained non-dominated solutions to a reasonable number. 
The solutions obtained by an MOEA are modified using a local search method, in which 
a weighted objective function is minimized. The use of a local search method from 
the MOEA solutions will allow a better convergence to the true Pareto-optimal front. 
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A clustering method is suggested in general to reduce the size of the obtained set of 
solutions. For finite search space problems, the local search approach may itself reduce 
the size the the obtained set. 

A specific MOEA — elitist non-dominated sorting GA or NSGA-II — and a hill- 
climbing local search method are used together to solve a number of engineering shape 
optimization problems for two objectives. Minimizing the weight of a structure and 
minimizing the maximum deflection of the structure have conflicting solutions. When 
these two objectives are considered together in a design, a number of Pareto-optimal 
solutions result. By representing presence and absence of small constituting elements in 
a binary string EEB, NSGA-II uses an innovative crossover operator which seems to 
help in combining good partial solutions together to form bigger partial solutions. The 
finite element method is used to evaluate a string representing a shape. The paper shows 
how the proposed hybrid technique can find a number of solutions with different trade- 
offs between weight and deflection. On a cantilever plate design, a simply-supported 
plate design, a hoister plate design, and a bicycle frame design problem, the proposed 
technique finds interesting and well-engineered solutions. These results indicate that the 
proposed hybrid technique is ready to be applied to more complex engineering shape 
design problems. 



2 Hybrid Approach 

It has been established elsewhere that NSGA-II is an efficient procedure of finding a 
wide-spread as well as well-converged set of solutions in a multi-objective optimization 
problem ll!Il4ll . NSGA-II uses (i) a faster non-dominated sorting approach, (ii) an elitist 
strategy, and (iii) no niching parameter. It has been shown elsewhere I2I that the above 
procedure has 0{MN^) computational complexity. Here, we take NSGA-II a step closer 
to practice by 

1 . ensuring convergence closer to the true Pareto-optimal front, and 

2. reducing the size of the obtained non-dominated set. 

We illustrate both the above issues in the following subsections. 

2.1 Converging Better 

In a real-world problem, the knowledge of the Pareto-optimal front is usually not known. 
Although NSGA-II has demonstrated good convergence properties in test problems, we 
enhance the probability of its true convergence by using a hybrid approach. A local search 
strategy is suggested from each obtained solution of NSGA-II to find a better solution. 
Since a local search strategy requires a single objective function, a weighted objective 
or a Tchebyscheff metric or any other metric which will convert multiple objectives into 
a single objective can be used. In this study, we use a weighted objective: 

M 



( 1 ) 
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where weights are calculated from the obtained set of solutions in a special way. First, 
the minimum and maximum values of each objective function fj are noted. 
Thereafter, for any solution x in the obtained set, the weight for each objective function 
is calculated as follows: 

- /,-(x))/(/j”- - /™) 

i(/r"-/fcW)/(/r"-/r")' 

In the above calculation, minimization of objective functions is assumed. When a so- 
lution X is close to the individual minimum of the function fj , the numerator becomes 
one, causing a large value of the weight for this function. For an objective which has to 
be maximized, the term — /j(x)) needs to be replaced with (/j^ — /““). The 

division of the numerator with the denominator ensures that the calculated weights are 
normalized or J^fLi = 1- Once the pseudo- weights are calculated, the local search 
procedure is simple. Begin the search from each solution x independently with the pur- 
pose of optimizing F{x.). Figure Q] illustrates this procedure. Since, the pseudo-weight 






E 



M 



f2 




Fig. 1. The local search technique may find better solutions. 



vector w dictates roughly the priority of different objective functions at that solution, 
optimizing F{x) will produce a Pareto-optimal or a near Pareto-optimal solution. This 
is true for convex Pareto-optimal regions. However, for non-convex Pareto-optimal re- 
gions, there exists no weight vector corresponding to Pareto-optimal solutions in certain 
regions. Thus, a different metric, such as Tchebysheff metric can be used in those cases. 
Nevertheless, the overall idea is that once NSGA-II finds a set of solutions close to the 
true Pareto-optimal region, we use a local search technique from each of these solutions 
with a differing emphasis of objective functions in the hope of better converging to the 
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true Pareto-optimal front. Since independent local search methods are tried from each 
solution obtained using an MOEA, all optimized solutions obtained by the local search 
method need not be non-dominated to each other. Thus, we find the non-dominated set of 
solutions from the obtained set of solutions before proceeding further. Other studies as 
DU use the local search method during a GA run. Each solution is modified with a local 
search method before including it in the population. The proposed approach is likely to 
have a lesser computational cost, however this will be a matter of future research to find 
a comparison between the two studies. 

The complete procedure of the proposed hybrid strategy is shown in EigureQ Starting 
from the MOEA results, we first apply a local search technique, followed by a non- 
domination check. After non-dominated solutions are found, a clustering technique is 
used to reduce the size of the optimal set, as discussed in the next subsection. 




Fig. 2. The proposed hybrid procedure of using a local search technique, a non-domination check, 
and a clustering technique is illustrated. 



2.2 Reducing the Size of Non-dominated Set 

In an ideal scenario, an user is interested in finding a good spread of non-dominated 
solutions closer to the true Pareto-optimal front. Prom a practical standpoint, the user 
would be interested in a handful of solutions (in most cases, 5 to 10 solutions are probably 
enough). Interestingly, most MOEA studies use a population of size 100 or more, thereby 
finding about 100 different non-dominated solutions. The interesting question to ask is 
‘Why are MOEAs set to find many more solutions than desired?’ 

The answer is fundamental to the working of an EA. The population size required in 
an EA depends on a number of factors related to the number of decision variables, the 
complexity of the problem, and others d2|. The population cannot be sized according to 
the desired number of non-dominated solutions in a problem. Since in most interesting 
problems, the number of decision variables are large and are complex, the population 
sizes used in solving those problems can be in hundreds. Such a population size is 
mandatory for the successful use of an EA. The irony is that when an MOEA works well 
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with such a population size N, eventually it finds N different non-dominated solutions, 
particularly if the niching mechanism used in the MOEA is good. Thus, we need to 
devise a separate procedure of identifying a handful of solutions from the large obtained 
set of non-dominated solutions. 

One approach would be to use a clustering technique similar to that used in fTTII for 
reducing the size of the obtained non-dominated set of solutions. In this technique, each 
of N solutions is assumed to belong to a separate cluster. Thereafter, the distance dc 
between all pairs of clusters is calculated by first hnding the centroid of each cluster and 
then calculating the Euclidean distance between the centroids. Two clusters having the 
minimum distance are merged together into a bigger cluster. This procedure is continued 
till the desired number of clusters are identified. Einally, with the remaining clusters, 
the solution closest to the centroid of the cluster is retained and all other solutions from 
each cluster are deleted. This is how the clusters can be merged and the cardinality of 
the solution set can be reduced. EigureOI shows the MOEA solution set in open boxes 
and the reduced set in solid boxes. Care may be taken to choose the extreme solutions 
in the extreme clusters. 



f2 




Fig. 3. The clustering method of reducing the set of non-dominated solutions is illustrated. 



However, in many problems the local search strategy itself can reduce the cardinality 
of the obtained set of non-dominated solutions. This will particularly happen in problems 
with a discrete search space. For two closely located solutions, the pseudo-weight vectors 
may not be very different. Thus, when a local search procedure is started from each of 
these solutions (which are close to each other) with a F(x) which is also similar, the 
resulting optimum solutions may be identical in a discrete search space problem. The 
solutions a and b in Figure Clare close and after the local search procedure they may 
converge to the same solution A. Thus, for many solutions obtained using NSGA-II, 
the resulting optimum obtained using the local search method may be the same. Thus, 
the local search procedure itself may reduce the size of the obtained non-dominated 
solutions in problems with a finite search space. Figure 0 shows that clustering is the 
final operation of the proposed hybrid strategy. 
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3 Engineering Shape Design 



With the advent of evolutionary algorithm as an alternate optimization method, there 
exist a number of applications of optimal shape design, where shapes are evolved by 
deciding presence or absence of a number of small elements II 131811211 hi . A predehned 
area (or volume) is divided into a number of small regular elements. The task of an 
evolutionary optimization procedure is to hnd which elements should be kept and which 
should be thrown away so that the resulting shape is optimal with respect to an objective 
function. This procedure has a number of advantages: 



1 . The use of numerical hnite element method (or boundary element method) is an 
usual method of analyzing an engineering component. Since hnite element method 
procedure requires the component to be divided into a number of small elements, 
this approach reduces one computation step and is complimentary to the usual hnite 
element method. 

2. Since no a priori knowledge about the shape is required, this method does not have 
any bias from the user. 

3. By simply using three-dimensional elements, the approach can be extended to three- 
dimensional shape design problems. 

4. The number and shape of holes in a component can evolve naturally without explic- 
itly hxing them by the user. 

Most studies of this method, including the studies with evolutionary algorithms, have 
concentrated on optimizing a single objective. In this study, we apply this evolutionary 
procedure for multiple conflicting objectives. 



3.1 Representation 

In this study, we consider two-dimensional shape design problems only. However, the 
procedure can be easily extended to three-dimensional shape design problems as well. 
We begin with a rectangular plate, describing the maximum overall region, where the 
shapes will be confined. Thereafter, we divide the rectangular plate into a finite number 
of small elements (refer to Figure 0). We consider here square elements, although any 
other shape including triangular or rectangular elements can also be considered. Since 
the presence or absence of every element is a decision variable, we use a binary coding 
describing a shape. For the shape shown in Figure 0 the corresponding binary coding 
is as follows: 

OHIO 11111 10001 11111 

The presence is denoted by a 1 and the absence is shown by a 0. A left-to-right coding 
procedure as shown in Figure0is adopted here. In order to smoothen the stair-case like 
shape denoted by the basic skeleton representation, we add triangular elements (shown 
shaded) for different cases in Figure Q The resulting skeleton shape shown in Figure 0 
represents the true shape shown in Figure^ 
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Fig. 4. Rectangular plate di- Fig. 5. The skeleton of a shape. Fig. 6. Final smoothened 
vided into small elements. shape. 




Fig. 7. Different cases of smoothening through triangular elements. 



3.2 Evaluation 

When the shape is smoothened, the shape is further divided into smaller elements. All 
interior rectangular elements are divided into two triangles and all boundary elements (in- 
cluding elements around a hole) are divided into four small triangles. Even the boundary 
triangles used for smoothening is divided into smaller triangles. The shape is evaluated 
by finding the maximum stress and deflection developed at any point in the component 
by the application of the specihed loads. Since no connectivity check is made while cre- 
ating a new string or while creating the initial random population, a string may represent 
a number of disconnected regions in the rectangle. In this case, we proceed with the 
biggest cluster of connected elements (where two elements are defined to be connected 
if they have at least one common corner). The string is repaired by assigning a 0 at all 
elements which are not part of the biggest cluster. 

In all applications here, two conflicting objectives are chosen: weight and deflection. 
These two objectives are conflicting because a minimum weight design is usually not 
stiff and produces a large deflection, whereas a minimum deflection design has densely 
packed elements, thereby causing a large weight of the overall component. The maximum 
stress and deflection values are restricted to lie within specified limifs of the design by 
using them as constraints. 





392 



K. Deb and T. Goel 



4 Simulation Results 



To show the efficacy of the proposed hybrid multi-objective optimization procedure in 
solving optimal shape design problems, we use a number of mechanical component 
design problems. Since binary-coded strings are used to represent a shape, we use a 
bit-wise hill-climbing strategy as the local search operator. The procedure is simple. 
Starting from the left of the string, every bit is flipped to see if it improves the design. If 
it does, the flipped bit is retained, else the bit is unchanged. This procedure is continued 
until no bit-flipping over the the entire string length has resulted an improvement. 

Since the shapes are represented in a two-dimensional grid, we introduce a new 
crossover operator which respects the rows or columns of two parents. Whether to swap 
rows or columns are decided with a probability 0.5. Each row or column is swapped 
with a probability 0.95/d, where d is the number of rows or columns, as the case may 
be. This way on an average all most one row or column will get swapped between the 
parents. A bit-wise mutation with a probability of 1/string-length are used. NSGA-II is 
continued till 150 generations. It is important to highlight that NSGA-II does not require 
any extra parameter setting. In all problems, a population of size 30 is used. 

For all problems, we use the following material properties: 



Plate thickness 
Yield strength 
Young’s modulus 
Poisson’s ratio 



50 mm 
150 MPa 
200 GPa 
0.25 



4.1 Cantilever Plate Design 

First, we consider a cantilever plate design problem, where an end load P = 10 kN is 
applied as shown in Figure 0 The rectangular plate of size 60 x 100 mm^ is divided 
into 60 small rectangular elements. Thus, 60 bits are used construct a binary string 
representing a shape. 
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Fig. 8. The loading and support of the cantilever plate are shown. 



Figured shows the four steps of the proposed hybrid method in designing the can- 
tilever plate. First plot shows the non-dominated solutions obtained using NSGA-II. 
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Fig. 9. Hybrid procedure to find nine trade-off solutions for the cantilever plate design problem. 



Since the population size is 30, NSGA-II is able to find 30 different non-dominated 
solutions. Thereafter, the local search method is applied from each non-dominated so- 
lution and new and improved set of solutions are obtained. The third plot is the result 
of the non-dominated check of the solutions obtained after the local search method. 
Three dominated solutions are eliminated by this process. The final plot is obtained after 
the clustering operation with a choice of nine solutions. The plot shows how nine well 
distributed set of solutions are found from the third plot of 27 solutions. If fewer than 
nine solutions are desired, the clustering mechanism can be set accordingly. 

In order to visualize the obtained set of nine solutions having a wide range of trade- 
offs in the weight and scaled deflection values, we show the shapes in Figure El It is 
clear that starting from a low- weight solution (with large deflection), how large- weight 
(with small deflection) shapes are found by the hybrid method. It is interesting to note 
that the minimum weight solution eliminated one complete row (the bottom-most row) 
in order to reduce the overall weight. The second solution (the element (1,2) in the above 
3x3 matrix) corresponds to the second-best weight solution. It is well known that for an 
end load cantilever plate, a parabolic shape is optimal. Both shapes (elements (1,1) and 
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Fig. 10. Nine trade-off shapes for the cantilever plate design. 



(1,2)) exhibits a similar shape. As the importance of deflection increases, the shapes tend 
to retain more and more elements, thereby making the plate rigid enough to have smaller 
deflection. In the middle, the development of vertical stiffener is interesting. This is a 
compromise between the minimum weight solution and a minimum deflection solution. 
By adding a stiffener the weight of the structure does not increase much, whereas the 
stiffness of plate increases (hence the deflection reduces). Finally, the complete plate 
with right top and bottom ends chopped off is the minimum deflection solution. 

We would like to reiterate here that the above nine solutions are not results of multiple 
runs of a multi-objective optimization algorithm. All nine solutions (and if needed, more 
can also be obtained) with interesting trade-offs between weight and deflection are 
obtained using in one simulation run of the hybrid method. 

4.2 Simply-Supported Plate Design 

Next, we consider a simply-supported plate design, starting from a rectangular plate of 
identical dimension as in the previous design. The plate is supported on two supports as 
shown in Figure nTland a vertical load P = 10 kN is acted on the top-middle node of 
the plate. 

Figure [O shows the obtained non-dominated solutions using NSGA-II. After lo- 
cal search method, the obtained non-dominated solutions have a wider distribution. 
The number of solutions have been reduced from 30 solutions to 22 solutions by the 
non-dominated checking. Finally, the clustering algorithm finds nine widely separated 
solutions from 22 non-dominated solutions. The shape of these nine solutions are shown 
in FigureO The minimum weight solution tends to use one row (the top-most row) less. 
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Fig. 11. The loading and support of the simply-supported plate are shown. 




Weight 



Fig. 12. Hybrid procedure finds nine trade-off solutions for the simply-supported plate design 
problem. 



but since the load is acting on the top of the plate, one element is added to have the load 
transferred to the plate. The third solution (shown in the (l,3)-th position in the matrix) 
is interesting. A careful look at Figure Elreveals that this solution is a ‘knee’ solution. 
To achieve a small advantage in weight-loss, a large sacrifice in the deflection-gain is 
evident. Similarly, to achieve a small advantage in deflection-loss, a large sacrifice in 
weight is needed. Shapes in position (1,2) and (2,1) can be compared with respect to 
the shape in position (1,3). Shape in position (3,1) or solution 7 is also interesting. In 
order to have further reduction in deflection stiffening of the two slanted arms is needed. 
Finally, the absolute minimum deflection shape is the complete rectangle with maximum 
possible weight. 

Starting with the minimum weight design having two slim slanted legs down to 
thickening the legs to make them stiff, followed by joining the legs with a stiffener. 
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Fig. 13. Nine trade-off shapes for the simply-supported plate design. 



and finally finding the complete rectangular plate having minimum deflection are all 
intuitive trade-off solutions. In the absence of any such knowledge, it is interesting how 
the hybrid procedure with NSGA-II is able to find the whole family of different trade-off 
solutions. 



4.3 Bicycle Frame Design 

Finally, we attempt to design a bicycle frame for a vertical load of 10 kN applied at A in 
Figure[01 The specifications are similar to that used elsewhere III 311 . The plate is 20 mm 
thick and is restricted to be designed within in the area shown in FigureO The frame 
is supported at two places B and C. The point B marks the position of the axle of the 
rear wheel and the point C is the location of the handle support. The filled element is the 
location of the pedal assembly and is always present. The material yield stress is 140 
MPa, Young’s modulus is 80 GPa and Poisson’s ratio is 0.25. The maximum allowed 
displacement is 5 mm. 

Figureir51shows the NSGA-II solutions and corresponding solutions obtained by the 
hybrid approach. Here, we are interested in finding four different trade-off solutions. 

These four solutions obtained by NSGA-II are shown mounted on a sketch of a 
bicycle in Figure [El The top-left solution is the minimum weight design. The second 
solution joins the two vertical legs to make the structure more stiff. The other two 
solutions make the legs more thick in order to increase the stiffness of the frame. The 
interior hole and absence of top-left elements are all intuitive. The proposed hybrid 
approach can evolve such solutions without these knowledge and mainly by finding and 
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Fig. 14. The hybrid procedure is illustrated for the bicycle frame design. 




Weight 



Fig. 15. The hybrid procedure is illustrated for the bicycle frame design. 



maintaining trade-off solutions among weight and deflection. The presence of many such 
solutions with different trade-offs between weight and stiffness provides a plethora of 
information about various types of design. 



5 Conclusion 



The hybrid multi-objective optimization technique proposed in this paper uses a com- 
bination of an multi-objective evolutionary algorithm (MOEA) and a local search op- 
erator. The proposed technique ensures a better convergence of MOEAs to the true 
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Fig. 16. Four trade-off shapes for the bicycle frame design. 



Pareto-optimal region and helps in finding a small set of diverse solutions for practical 
reasons. 

The efficacy of the proposed technique is demonstrated by solving a number of en- 
gineering shape design problems for two conflicting objectives — weight of the structure 
and maximum deflection of the structure. In all cases, the proposed technique has been 
shown to find a set of four to nine diverse solutions better converged than an MOEA 
alone. The results are encouraging and takes the evolutionary multi-objective optimiza- 
tion approach much closer to practice. 

Acknowledgements. Authors acknowledge the support provided by Ministry of Human 
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Abstract. Topology design of enterprise networks is a hard combina- 
torial optimization problem. It has numerous constraints, several objec- 
tives, and a very noisy solution space. Besides the NP-hard nature of 
this problem, many of the performance metrics of the network can only 
be estimated, given their dependence on many of the dynamic aspects of 
the network, e.g., routing and number and type of traffic sources. Fur- 
ther, many of the desirable features of a network topology can best be 
expressed in linguistic terms, which is the basis of fuzzy logic. In this 
paper, we present a fuzzy evolutionary hybrid metaheuristic for network 
topology design. This approach is dominance preserving and scales 
well with larger problem instances and a larger number of objective cri- 
teria. Experimental results are provided. 



1 Introduction 

A typical enterprise network provides communication services to a large num- 
ber of hosts, such as mainframe computers, mini systems, workstations, PCs, 
printers, etc., p. Network active elements such as routers, switches, and hubs 
are used to interconnect these computers and peripherals. The network topology 
is governed by several constraints. Geographical constraints dictate the break- 
down of such internetworks into smaller parts or groups of nodes, where each 
group makes up what is called a LAN. A LAN consists of all network elements 
which do not include routers or layer-3 switches. Routers delineate the bound- 
aries of LANs. Communication services of a modern organization are centered 
around a structured campus network, which consists of a backbone interconnect- 
ing a number of LANs via routers or layer-3 switches. Further, the nodes of a 
LAN may be subdivided into smaller parts, called LAN segments (see Fig. [Q). 
Overdimensioning a network is easy; however, designing a cost-optimized net- 
work is always very hard. Hardness is a function of the size, the constraints, and 
obviously the cost parameters to tradeoff. Furthermore, with many cost param- 
eters and constraints, the notion of optimality is not clear. A more reasonable 
approach is to seek a solution that possesses a set of desirable properties and do 
not violate some well established design principles. Examples of these principles 
are: 
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Fig. 1. A typical Campus Network (WS represents workgroup switch). 



— There is a physical path between any two nodes. 

— The number of hops between any two stations does not exceed a given thresh- 
old. 

— Only a given small fraction of links have utilization levels below some thresh- 
old. 

A category of algorithms that were found to be effective for such problems 
are iterative metaheuristics. These allow you to walk the state space of solutions 
while evaluating each solution against any desirable set of properties. These 
meta-heuristics are characterized by hill climbing property that allows occa- 
sional acceptance of inferior solutions 0. Heuristics like genetic algorithm |3|, 
simulated annealing tabu search simulated evolution |E|, and stochastic 
evolution are examples of stochastic iterative heuristics. Detailed description 
of these heuristics can be found in |2I, and an interesting classification of some 
of them is given in jS] . 

In this work we propose a hybrid meta-heuristic for the topology design prob- 
lem which follows the search strategy of Simulated Evolution (SE) algorithm. SE 
is a memoryless meta-heuristic, where the walk through the state space is heav- 
ily influenced by the allocation operator. The memoryless nature of the search 
usually results in partial revisiting of areas of the state space. To minimize the 
effect of such undesirable behavior, the allocation step of SE is implemented 
while following tabu search approach. 



2 Background 

Many combinatorial optimization problems can be formulated as follows 0: 
Given a finite set M of distinct movable elements and a finite set L of locations, 
a state is defined as an assignment function S : M L satisfying certain con- 
straints. The topology design problem fits this generic model. For this problem. 
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given a set of links E = {ei, 62 , e„} and a set of locations L = {0, 1}, where 
L{ei) = 1 iff link belongs to the topology and L{ci) = 0 otherwise. We seek 
to find an assignment S : E ^ L which corresponds to feasible topology of 
desirable properties. 

Unlike constructive algorithms, which produce a solution only at the end of 
the design process, iterative algorithms produce numerous solutions during the 
course of their search. In order to compare alternative topologies, the cost of 
each topology is estimated for the objectives under consideration. Important ob- 
jectives are the minimization of monetary cost, network latency, and maximum 
number of hops between any source-destination pair. Most of the objectives and 
constraints depend on several aspects such as network flow dynamics, technology 
trends, strategic commercial goals, etc., that can best be expressed in linguistic 
terms, which is the basis of fuzzy logic. In this work, the cost function, con- 
straints, as well as some of the SE algorithm operators are implemented using 
fuzzy algebra [B|. 

2.1 SE Algorithm 

Simulated Evolution (SE) is a stochastic evolutionary search strategy that falls 
in the general category of meta-heuristics. It was first proposed by Kling and 
Banerjee in jOI. SE adopts the generic state model described above, where a 
solution is seen as a population of movable elements. 

Starting from a given initial solution, SE repetitively executes the following 
three steps in sequence: evaluation, selection, and allocation, until certain 
stopping conditions are met. The pseudo-code of the SE algorithm is given in 
Fig.13 The evaluation step estimates the goodness of each element in its 
current location. The goodness of an element is a ratio of its optimum cost to its 
actual cost estimate, and therefore belongs to the interval [0,1]. It is a measure 
of how near each element is to its optimum position. The higher the goodness 
of an element, the closer is that element to its optimum location with respect 
to the current configuration. In selection step, the algorithm probabilistically 
selects elements for relocation. Elements with low goodness values have higher 
probabilities of getting selected. A selection bias (B) is used to compensate for 
errors made in the estimation of goodness. Its objective is to inflate or deflate the 
goodness of elements. A high positive value of bias decreases the probability of 
selection and vice versa. Large selection sets also degrade the solution quality due 
to uncertainties created by large perturbations. Similarly, for high bias values 
the size of the selection set is small, which degrades the quality of solution due 
to limitations of the algorithm to escape local minima. A carefully tuned bias 
value results in good solution quality and reduced execution time jH). 

Elements selected during the selection step are assigned to new locations 
in the allocation step with the hope of improving their goodness values, and 
thereby reducing the overall cost of the solution. Allocation is the step that has 
most impact on the quality of the search performed by the SE algorithm. A com- 
pletely random allocation makes the SE algorithm behave like a random walk. 
Therefore, this operator should be carefully engineered to the problem instance 
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Simulated-Evolution{B , ^initial, StoppingC ondition) 

NOTATION 

B— Bias Value. 

•P— Complete Solution. 

Si— Individual link in P. 

Oi— Lower bound on cost of link. 

Ci— Current cost of 2 *^ link in P. 
gi— Goodness of 2 *^ link in P. 

S— Queue to store the selected links. 

ALLOC ATE Pi)— Y'nnct ion to allocate in partial solution Pi 

Begin 

Repeat 

EVALUATION: ForEach Ci £ P DO 
begin 




end 

SELECTION: ForEach Ci ^ P DO 

begin 

IF Random > Min{gi + B, 1) 

THEN 

begin 

S — S U Ei] Remove from P. 

end 

end 

Sort the elements of S 
ALLOCATION: ForEach Ci G 5 DO 
begin 

ALLOCATE{ei,Pi) 

end 

Until Stopping Condition is satisfied 
Return Best solution. 

End {Simulated-Evolution) 



Fig. 2. Structure of the simulated evolution algorithm. 



and must include domain-specific knowledge. Different constructive allocation 
schemes are proposed in |^. 

Though SE falls in the category of meta-heuristics such as simulated anneal- 
ing (SA) and genetic algorithm (GA), there are significant differences between 
these heuristics (see 0). A classification of meta-heuristics proposed by Glover 
and Laguna jS| is based on three basic features: (1) the use of adaptive memory 
where the letter A is used if the meta-heuristic employs adaptive memory and 
the letter M is used if it is memoryless; (2) the kind of neighborhood explo- 
ration, where the letter N is used if the meta-heuristic performs a systematic 
neighborhood search and the letter S is used if stochastic sampling is followed; 
and (3) the number of current solutions carried from one iteration to the next, 
where the digit 1 is used if the meta-heuristic maintains a single solution, and the 
letter P is used if a parallel search is performed with a population of solutions 
of cardinality P. For example, according to this classification. Genetic algorithm 
is M/S/P, tabu search is A/N/1, and both simulated annealing and simulated 
evolution are M/S/1. The heuristic proposed in this work is A/S/1. 
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2.2 Fuzzy Logic 

Fuzzy Logic is a mathematical discipline invented to express human reasoning 
in rigorous mathematical notation. Unlike classical reasoning in which a propo- 
sition is either true or false, fuzzy logic establishes approximate truth value of 
proposition based on linguistic variables and inference rules. A linguistic vari- 
able is a variable whose values are words or sentences in natural or artificial 
language 0. By using hedges like ‘more’, ‘many’, ‘few’ etc., and connectors like 
AND, OR, and NOT with linguistic variables, an expert can form rules, which 
will govern the approximate reasoning. 

During the topology design process, some desirable objectives, such as the 
delay, can only be imprecisely estimated. Fuzzy logic provides a rigorous al- 
gebra for dealing with imprecise information. Furthermore, it is a convenient 
method of combining conflicting objectives and expert human knowledge. From 
the pseudocode of the SE algorithm given in Fig.|^ it is clear that there are two 
phases of the algorithm which could be modeled to include multiple objectives. 
These phases are evaluation and allocation. We have used fuzzy logic based 
reasoning in these two phases. 

3 Assumptions and Notation 

In this work, we have assumed the following: 

— The {x, y) location of each host is given. 

— All hosts have either Ethernet (10 or 100 Mbps) or Token Ring (4 or 16 
Mbps) interfaces. 

— The traffic rates generated among pairs of hosts are assumed known. 

— Vertical cabling (interconnection of local sites to backbone switches) is im- 
plemented with fiber optic cables. 

— Horizontal cabling portion (cabling within the work area/local site) is im- 
plemented with Category 5 UTP (or STP for Token-Ring). 

— The root node is a switch acting as a collapsed backbone with given required 
interfaces. 

— There is a user specified limit on the number of network addresses per subnet. 

— Maximum allowed utilization of any link should not exceed a desired thresh- 
old (e.g. 60 %). 

For the following sections, we shall use the notation given below: 

n number of clusters/local sites. 

m number of LAN segments in a cluster. 

T n X n local site topology matrix where Uj = 1, if local sites i and j 
are connected and tij — 0 otherwise. 

Xi traffic on link i. 

Xmax,i capacity of link i. 

L number of links of the proposed topology. 
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Dnd average delay between any source destination pair. 

Pi maximum number of clusters which can be connected to device i. 

"fij external traffic between clusters i and j. 

7 overall external traffic. 



4 Problem Statement 

We seek to find a feasible topology of near optimum overall cost. A feasible 
topology is one that satisfies design constraints. Optimality of a topology is 
measuredwith respect to three objectives: monetary cost, average network delay 
per packet (network latency) , and maximum number of hops between any source- 
destination pair. 

Three important constraints are considered. 

1. The first set of constraints is dictated by bandwidth limitation of the links. 
A good network would be one in which links are “reasonably” utilized, oth- 
erwise this would cause delays, congestion, and packet loss. Thus the traffic 
ffow on any link i must never exceed a threshold value: 

^ ^max,i ^ 2, ..., S (1) 

where s is the total number of links present in the topology. 

2. The second constraint is that the number of clusters attached to a network 
device i must not be more than the port capacity Pi of that device. 

n 

^ Uj < Pi i = 1, 2, ..., n Vz yf j (2) 

i=i 

3. The third set of constraints express the designer’s desire to enforce certain 
hierarchies on the network devices. For example, one might not allow a hub 
to be the parent of a router or backbone device. 

Below, we describe the objective criteria used to measure the goodness of a given 
topology. 



Monetary cost: The goal is to find the topology with minimum possible cost, 
while meeting all the requirements and constraints. The cost of the cable and 
the cost of the network devices are the two main entities affecting the monetary 
cost, therefore: 



cost — (/ X Ccable) 4“ (('nd) (3) 

where I represents the total length of cable, Ccabie represents the cost per unit of 
the cable used, and Cnd represents the combined costs of all the routers, switches, 
and hubs used. 
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Average Network Delay: The second objective is to minimize the average 
network delay, while considering the constraints and requirements. To devise a 
suitable function for average network delay, we approximate the behavior of a 
link and network device by an M/M/1 queue p.The delay per bit due to the 
network device between local sites i and j is i?ij- = fibij, where ^ is the average 
packet size in bits and is the delay per packet. If 7 ^ is the total traffic 
through the network device between local sites i and j, then the average delay 
due to all network devices is: 



where d is the total number of network devices in the network. Thus, the total 
average network delay is composed of delays of links and network devices and is 
given by [S| 



Maximum number of hops between any source-destination pair: The 

maximum number of hops between any source-destination pair is also another 
objective to be optimized. A hop is counted as the packet crosses a network 
device. 

5 Proposed Algorithm and Implementation Details 

This section describes our proposals of fuzzification of different stages of the SE 
algorithm. We confine ourselves to tree design. Trees are minimal and provide 
unique path between every pair of local sites. Further, the design of a general 
mesh topology usually starts from a near optimal constrained spanning tree. 

5.1 Initialization 

The initial spanning tree topology is generated randomly, while keeping into 
account the feasibility constraints mentioned earlier. 

5.2 Proposed Fuzzy Evaluation Scheme 

The goodness of each individual is computed as follows. In our case, an 
individual is a link which interconnects the devices of two local sites (at the 
backbone level) or two network devices (at the local site level). In the fuzzy 
evaluation scheme, monetary cost and optimum depth of a link (with respect 
to the root) are considered fuzzy variables. Then the goodness of a link is 
characterized by the following rule. 




d 



d 



( 4 ) 
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Rule 1: IF a link is near optimum cost AND near optimum depth 
THEN it has high goodness. 



Here, near optimum cost, near optimum depth, and high goodness are linguistic 
values for the fuzzy variables cost, depth, and goodness. Using and-like compen- 
satory operator m, Rule 1 translates to the following equation for the fuzzy 
goodness measure of a link li. 



9h = = a"" X min{fj.l{k),p^{k)) -b (1 - x ^ ^ (6) 

^ i=l 

The superscript e stands for evaluation and is used to distinguish similar 
notation in other fuzzy rules. In Q, is the membership in the fuzzy set 

of high goodness links and a® is a constant. The p.\{li) and represent 

memberships in the fuzzy sets near optimum monetary cost and near optimum 
depth. 

In order to find the membership of a link with respect to near optimum 
monetary cost, we proceed in following manner. From the cost matrix, which 
gives the costs of each possible link, we find the minimum and maximum costs 
among all the link costs. We take these minimum and maximum costs as the lower 
and upper bounds and call them “LCostMin” and “LCostMax” respectively and 
then find the membership of a link with respect to these bounds. Furthermore, in 
this work, we have normalized the monetary cost with respect to “LCostMax” . 
The required membership function is represented as depicted in Fig. 0 where 
X — axis represents i^cos°Max ’ V ~ represents the membership value, A = 
and B = = 1- This normalization enables us to use the 

same membership function for all topology design instances. 




Fig. 3. Membership function for the objective to be optimized. 

In the same manner, we can find the membership of a link with respect 
to near optimum depth. The lower limit, which we call “LDepthMin” is taken 
to be a depth of I with respect to the root. The upper bound, which we call 
“LDepthMax” is taken to be 1.5 times of the maximum depth generated in the 
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initial solution or a maximum of a user specified limit 0 For example, if in the 
initial solution, the maximum depth turns out to be 4, then “LDepthMax” for 
the depth membership function would be 6. This is done to give chance to links 
which may have more depth than the one in the initial solution. If we take the 
initial solution maximum depth as “LDepthMax” , then in the following iterations 
some links with higher depths will have a membership value of zero (with respect 
to depth membership function) and thus they will not be able to play any role 
as far as depth is concerned. However, due to technological limitations, we have 
limited the maximum possible depth to 7, in the case when “LDepthMax” turns 
out to be more than 4. The reason for having the maximum depth of 7 is that 
the hop limit for RIP is 15. This means that if a maximum depth of 7 is taken, 
then in the worst case we would have a total of 14 hops from a source to a 
destination. The membership function with respect to near optimum depth can 
be represented as illustrated in Fig. 01 where x — axis represents LDepth, y—axis 
represents the membership value, A = LDepthMin, and B = LDepthMax. 



5.3 Selection 

In this stage of the algorithm, for each link li in current tree topology, where i 
= 1,2,..., n-1, a random number RAN DOM € [0, 1] is generated and compared 
with gi + B, where B is the selection bias. If RAN DOM > pi + B, then link k is 
selected for allocation and considered removed from the topology. Bias B is used 
to control the size of the set of links selected for removal. A bias methodology 
called variable bias HH has been used in this paper. The variable bias is a 
function of quality of eurrent solution. When the overall solution quality is poor, 
a high value of bias is used, otherwise a low value is used. Average link goodness 
Qi is a measure of how many “good” links are present in the topology. The bias 
value changes from iteration to iteration depending on the quality of solution. 
The variable bias is calculated as follows: 



Bk = I- Gk 



where B^ is the bias for iteration and Gk is average goodness of all the 
links at the beginning of iteration k. 



5.4 Proposed Fuzzy Allocation Scheme 

During the allocation stage of the algorithm, the selected links are removed 
from the topology one at a time. For each removed link, new links are tried 
in such a way that they result in overall better solution. Before the allocation 
step starts, the selected links are sorted according to their goodness values in 
ascending order. 

^ This user specified limit may be a design constraint, e.g., if each hop represents a 
router that uses Routing Information Protocol (RIP) then a limit would be 7, i.e., 
a branch of the tree should not have more than 7 routers. 
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In the fuzzy allocation scheme, the three criteria to be optimized are combined 
using fuzzy logic to characterize a good topology. The reason for using fuzzy logic 
is that the characterization of a good topology with respect to several criteria is 
usually based on heuristic knowledge which is acquired through experience. Such 
knowledge is most conveniently expressed in linguistic terms, which constitute 
the basis of fuzzy logic. For the problem addressed in this paper, a good topology 
is one that is characterized by a low monetary cost, low average network delay, 
and a small maximum number of hops. In fuzzy logic, this can easily be stated 
by the following fuzzy rule: 

Rule 2: IF a solution X has low monetary cost AND low average network 
delay AND low maximum number of hops between any source- 
destination pair THEN it is a good topology. 

The words “low monetary cost” , “low average network delay” , “low maximum 
number of hops” , and “good topology” are linguistic values, each defining a fuzzy 
subset of solutions. For example, “low average network delay” is the fuzzy subset 
of topologies of low average network delays. Each fuzzy subset is defined by a 
membership function /r. The membership function returns a value in the interval 
[0,1] which describes the degree of satisfaction with the particular objective 
criterion. Using the and- like ordered weighted averaging operator mil. the above 
fuzzy rule reduces to the following equation. 



= /3“ X -b (1 - /3“) x ^ ^ (7) 

i=l 

where p,°‘{x) is the membership value for solution x in the fuzzy set good 
topology and /3“ is a constant in the range [0,1]. The superscript a stands for 
allocation. Here, for i = {1,2,3} represents the membership values of solution x 
in the fuzzy sets low monetary cost, low average network delay, and low maximum 
number of hops between any source-destination pair respectively. The solution 
which results in the maximum value for m is reported as the best solution found 
by the SE algorithm. 

Below we describe how to get the membership functions for the three criteria 
mentioned above. 



Membership Function for Monetary Cost. First, we determine two ex- 
treme values for monetary cost, i.e., the minimum and maximum values. 
The minimum value, “TCostMin”, is found by using the Esau-Williams algo- 
rithm na, with all the constraints completely relaxed. This will surely give us 
the minimum possible monetary cost of the topology. The maximum value of 
monetary cost,“TCostMax”, is taken to be the monetary cost generated in the 
initial solution. The monetary cost is normalized with respect to “TCostMax” . 
The corresponding membership function is shown in Fig. 0 where x — axis rep- 
resents , y — axis represents the membership value, A = TCostMax . 

= 1 . 



TCostMax ’ 

and B = 

TCostMax 
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Membership Function For Average Network Delay. We determine two 
extreme values for average network delay. The minimum value, “TDelayMin”, is 
found by connecting all the nodes to the root directly, ignoring all the constraints 
and then calculating the average network delay using ©• The maximum value 
of average delay, “TDelayMax” , is taken to be the average delay generated in 
the initial solution. The average delay is normalized with respect to “TDelay- 
Max”. The membership function is shown in Fig.0 where x — axis represents 
TDeialZa^ ^ V ~ represents the membership value, A = and 

o TDelayMax -i 

TDelayMax 



Membership Function For Maximum Number of Hops. Again, two ex- 
treme values are determined. The minimum value, “THopsMin”, is taken to 
be 1 hop, which will be the minimum possible in any tree. The maximum 
value, “THopsMax” , is taken to be the maximum number of hops between any 
source-destination pair generated in the initial solution. The membership func- 
tion is shown in Fig. 0 where x — axis represents THops, y — axis represents 
the membership value, A = THopsMin, and B — THopsMax. 

In the proposed allocation scheme, all the selected links are removed one at 
a time and trial links are placed for each removed link. We start with the head- 
of-line link, i.e. the link with the worst goodness. We remove this link from the 
topology. This divides the topology into two disjoint trees. Now the placing of 
trial links begins. In this work, the approach to place trial links is as follows. At 
most ten trial moves (i.e., trial links) are evaluated for each removed link. One 
point to mention is that for the ten moves, some moves may be invalid. However, 
we search for only four “valid” moves. Whenever we find four valid moves, we 
stop, otherwise we continue until a total of ten moves are evaluated (whether 
valid or invalid). The removal of a link involves two nodes P and Q, of which 
node P belongs to the subtree which contains the root node and node Q belongs 
to the other subtree. For the ten moves we make, five of them are greedy and 
five are random. For the greedy moves, we start with node Q and five nearest 
nodes in the other subtree are tried. For the random moves, we select any two 
nodes in the two subtrees and connect them. If all the ten moves are invalid, 
in which case the original link is placed back in its position. The valid moves 
are evaluated based on GD and the best move among the ten moves is made 
permanent. This procedure is repeated for all the links that are present in the 
set of selected links. 

We have implemented two variations of allocation schemes. The first one is 
the same as has been described above, which we call SE. In the second variation. 
Tabu Search characteristics have been introduced, details of which follow. 



5.5 Tabu Search Based Allocation 

Tabu Search (TS) is a general iterative heuristic that is used for solving combina- 
torial optimization problems. The algorithm was first presented by F. Glover |S|. 
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Table 1. Characteristics of test cases used in our experiments. LCostMin, LCostMax, 
and TCostMin are in dollars. TDelayMin is in milliseconds. Traffic is in Mbps. 



Name 


# of Local Sites 


LCostMin 


LCostMax 


TCostMin 


TDelayMin 


Traffic 


nl5 


15 


1100 


9400 


325400 


2.14296 


24.63 


n25 


25 


530 


8655 


469790 


2.15059 


74.12 


n33 


33 


600 


10925 


624180 


2.15444 


117.81 


n40 


40 


600 


11560 


754445 


2.08757 


144.76 


n50 


50 


600 


13840 


928105 


2.08965 


164.12 



A key feature of TS is that it imposes restrictions on the search process, pre- 
venting it from moving in certain directions to drive the process through regions 
desired for investigation |Sj. It searches for the best move in the neighborhood 
of the current solution. 

In this work, we have modified the SE algorithm by introducing Tabu Search 
characteristics in the allocation phase. Recall that in the allocation phase, cer- 
tain number of moves are made for each link in the selection set and the best 
move is accepted, making the move (i.e., link) permanent. This newly accepted 
link is saved in a tabu list. Thus our attribute is the link itself. The aspiration 
criterion adopted is that if the link that had been made tabu produces a higher 
membership value than the current one in the membership function “good topol- 
ogy”, then we will override the tabu status of the link and make it permanent. 
This strategy prevents the selection and allocation operators from repetitively 
removing the same link and replacing it with a link of equal or worse goodness. 



5.6 Stopping Criterion 

In our experiments, we have used a fixed number of iterations as a stopping 
criterion. We experimented with different values of iterations and found that for 
all the test cases, the SE algorithm converges within 4000 iterations or less. 



6 Results and Discussion 

The SE algorithm described in this paper has been tested on several randomly 
generated networks. For each test case, the traffic generated by a typical local 
site was collected from real sites. Other characteristics, such as the number of 
ports on a network device, its type, etc. were assumed. However, the costs of 
the network devices and links were collected from vendors. The characteristics of 
test cases are listed in Table d The smallest test network has 15 local sites and 
the largest has 50 local sites. The hierarchies in which the devices are connected 
are that backbone switch is at the top, followed by routers, then workgroup 
switches, and then hubs. 
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Table 2. Best solution for different tabu list sizes. Monetary cost is in dollars, delay 
is in milli seconds per packet, and execution time is in minutes. 



Test Case 


Tabu list size 


Monetary Cost 


Avg. Delay 


Max. Hops 


nl5 


1 


298200 


2.935 


4 


2 


297100 


2.78 


4 


3 


294350 


3.448 


6 


4 


298100 


3.037 


5 


5 


296900 


3.278 


6 


n25 


3 


481745 


4.219 


8 


4 


478690 


4.189 


9 


5 


483210 


3.537 


6 


6 


479915 


4.275 


9 


7 


488400 


4.608 


9 


n33 


3 


655715 


5.772 


11 


5 


652785 


4.77 


8 


6 


682465 


4.19 


6 


7 


652310 


5.95 


10 


9 


667100 


5.087 


7 


n40 


5 


785795 


4.746 


10 


6 


798695 


8.019 


12 


7 


783970 


4.441 


9 


8 


786950 


5.478 


9 


9 


790645 


5.136 


8 


n50 


4 


958995 


6.739 


14 


5 


967110 


9.279 


14 


7 


983020 


5.245 


11 


8 


1075450 


5.725 


9 


9 


971965 


7.13 


12 



6.1 Effect of Tabu Search Based Allocation and Tabu List Size 

Table |2I shows the results obtained for the test cases using different tabu list 
sizes. In this table, monetary cost, average delay, and maximum hops of best 
solutions are reported along with the respective tabu list size. In the table we 
notice that as the test case size increases, the tabu list that gives the best solution 
also increases. For example, in nl5, tabu list size of 2 gives the best solution. 
Similarly, best solutions are achieved by tabu list sizes of 5, 6, 7, and 7 in n25^ 
nS3, n40, and nSO respectively. 

Table 0 gives the results for different test cases considering the frequency of 
tabu moves, and the respective tabu list size that gave the best solutions with 
their execution times. By frequency of tabu moves we mean the number of times 
a link was found tabu. We record this through a counter called tabu counter. The 
tabu counter only includes the number of tabu links which could not pass the 
aspiration criteria. It does not count the frequency of links which were actually 
tabu but managed to pass the aspiration criteria. From this figure, it can be seen 
that the percentage of tabu moves varies between 1% and 10%. 

6.2 Comparison of SE and SE_TS 

In this section, the results of SE_TS with SE are compared. Table 0 shows the 
results for SE and best tabu list size SE_TS. The percentage gain shows the 
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Table 3. Results for best tabu list size. Execution time is in minutes. 



Test case 


Tabu list size 
for best solution 


Total moves 


Tabu moves 


% Tabu moves 


Exec, time 


nl5 


2 


1241 


45 


3.62 


2.25 


n25 


5 


2496 


39 


1.56 


4 


n33 


6 


1352 


93 


6.878 


8 


n40 


7 


4223 


233 


5.51 


26 


n50 


7 


3995 


328 


8.21 


65 



improvement achieved by SE_TS when compared to SE. From this table, it is 
seen that SE_TS performs better than SE as far as monetary cost objective is 
concerned. In all the test cases, a gain is achieved by SE_TS. For example, a 
gain of 5.67 % is achieved in case of n50. A similar behavior is seen for average 
network delay metric, where SE_TS achieves gain in all the cases. Similarly, for 
maximum number of hops metric, a gain is achieved for all the cases except n50. 
However, the loss in maximum hops for n50 is compensated by the improvement 
in the monetary cost and delay metrics. As far as the execution time is concerned, 
it is also comparable. 

To compare the quality of search space between SE and SE_TS , we plot dif- 
ferent optimization parameters versus iteration count of the algorithms for the 
test case n40 (best tabu list size=7 in SE_TS). Fig. 0(a) compares the current 
monetary cost. From this plot it is clear that SE_TS converges a little faster 
towards a better solution. Somewhat similar behavior is seen with respect to av- 
erage network delay and maximum hops parameters in Figs. 0b) and (c), where 
we see that SE_TS performs better than SE. The reason SE_TS has better perfor- 
mance than SE is the following. In SE, since the search space for valid solutions 
is limited, it happens that after some iterations, same moves are repeated and 
thus the algorithm keeps searching in the same search space most of the time, 
while in SE_TS, more search space is covered because previous moves remain 
tabu for some time, causing the algorithm to diversify the search into another 
subarea. Recall that in the allocation phase, four valid moves are evaluated for 
each link in selection set and the best move (link) is made permanent. This new 
link is also saved in the tabu list simultaneously. However, it may happen that 
this new link may become “bad” (in terms of evaluation function) in the follow- 
ing iterations, upon which it is removed. But it may be possible that this link 
may become good again after one or more iterations, but since it is in the tabu 
list, it will not be chosen to be placed again, thus giving room to other links to 
be chosen. In general we see that SE_TS achieves better quality solution than 
SE. 

7 Conclusion 

In this paper we have presented a novel approach for topology design of campus 
networks based on fuzzy simulated evolution algorithm with two variations in 
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Fig. 4. Comparison of SE and SE_TS for n40. 



Table 4. Comparison of SE and SE_TS. C = Cost in dollars, D = Delay in milli seconds 
per packet, H = hops, T = execution time in minutes, TL= Tabu list size. Percentage 
gain shows improvement achieved by SE_TS compared to SE. 



Case 


SE 


SE_TS 


% Gain 




C 


D 


H 


T 


TL 


C 


D 


H 


T 


C 


D 


H 


nl5 


305500 


4.135 


7 


1 


2 


297100 


2.78 


4 


2.25 


2.7 


32.8 


42.8 


n25 


512415 


4.37 


7 


4.4 


5 


483210 


3.537 


6 


4 


5.7 


19.1 


14.28 


n33 


702815 


5.319 


7 


17 


6 


682465 


4.19 


6 


8 


2.89 


21.2 


14.28 


n40 


789625 


5.529 


9 


42 


7 


783970 


4.441 


9 


26 


0.72 


19.7 


0 


nSO 


1042080 


8.236 


10 


62 


7 


983020 


5.245 


11 


65 


5.67 


36.3 


-9.1 



the allocation scheme. Results obtained for the test cases considered suggest 
that fuzzy simulated evolution algorithm with tabu search allocation is a robust 
approach to this problem, and was always able to find good quality feasible 
solutions. 
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Abstract. The resolution of workshop problems such as the Flow Shop or the 
Job Shop has a great importance in many industrial areas. The criteria to 
optimize are generally the minimization of the makespan or the tardiness. 
However, few are the resolution approaches that take into account those 
different criteria simultaneously. This paper presents an approach based on 
hybrid genetic algorithms adapted to the multicriteria case. Several strategies of 
selection and diversity maintaining are presented. Their performances are 
evaluated and compared using different benchmarks. A parallel model is also 
proposed and implemented for the hybrid metaheuristic. It allows to increase 
the population size and the number of generations, and then leads to better 
results. 

Keywords: Genetic Algorithm, Multicriteria optimization. Flow Shop, Hybrid 
Metaheuristic, Local Search, Parallel Metaheuristic. 



1 Introduction 

The Flow Shop problem has received a great attention [6] [16] since its importance in 
many industrial areas [13]. The proposed methods to its resolution vary between exact 
methods such as the branch & bound algorithm [3], specific heuristics [7] [12] [16] and 
metaheuristics [10][11]. However, the majority of these works study the problem in 
its single criterion form and aim mainly to minimize the makespan. 

Population based algorithms such as genetic algorithms (GAs) have turned out to be 
of great efficiency to deal with multicriteria combinatorial optimization problems. 
The difficulty of the multicriteria case lies in the absence of a total order relation that 
links solutions of the problem. Considering the GAs, this insufficiency appears in the 
difficulty in designing a selection operator that assigns selection probabilities 
proportional to the desirability degree of the individuals in the population. Another 
difficulty is related to the balance between the exploration of the search space and the 
exploitation of the obtained Pareto frontier. Advanced mechanisms have been 
proposed to deal with this issue, such as combined sharing in the objective and the 
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decision space, hybrid GAs with local search, and parallel model and implementation 
of the algorithm. 

The next section of this paper presents the multicriteria flow shop problem we are 
interested in. We formulate the different objectives to optimize as well as the 
constraints to satisfy. In the third section, we will describe the application of genetic 
algorithms to the problem [16][17][18]. Different selection strategies are presented 
and their performances compared. We present the implemented diversity maintaining 
methods and their contribution in the quality of solutions. The fourth section is 
devoted to the presentation of the hybridization of multicriteria GAs with local search, 
and its contribution is underlined. In the fifth section, we describe and evaluate a 
parallel model for the proposed metaheuristic. 



2 A Multicriteria Flow Shop Problem 

The flow shop problem can be presented as a set of A jobs {J^, J^, .../„} to schedule 
on M machines. The machines are critical resources: one machine cannot be assigned 
to two jobs simultaneously. Each job is composed of M consecutive tasks J. = {f ,, 
t,„}, where t^. represents the j“’ task of the job J. requiring the machine m.. 
Following this description, jobs have the same processing sequence on the machines. 
To each task t^j is associated a processing time and each job J. must be achieved 
before the due date <7.. 

Scheduling of tasks on different machines must optimize certain regular criteria 
[16]. These criteria vary according the specificity of the treated problem, and 
generally consist in the minimization of the following objectives [16] : 

: makespan (total completion time); 

C : mean value of completion times of the jobs; 

: maximum tardiness; T : total tardiness; 

U : number of jobs delayed with regard to their due date d:, 

: maximum flow time; F : mean flow time. 

The optimization criteria taken into account are resumed into two objectives: 
minimizing the makespan and the total tardiness. 

fl = Cmax = Max + p,^) 

= r= X [max (0, + p,^- d,)] 

We are interested in the study of the permutation flow shop problem 
F/permu,d/(C_j^^^,T), where jobs must be scheduled in the same order on all the 
machines. 
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3 GA and Multicriteria Flow Shop Problem 



The application of GAs to a given problem needs, first, a chromosomal representation 
of a solution (in our case the schedule of jobs). The processing sequence of jobs on 
the machines being identical, a schedule is then considered as a permutation defining 
the processing order of the jobs in the machines. The used coding is a jobs array. A 
position of a job defines its sequencement order. 

Once a sequence of jobs is defined, all tasks are scheduled as early as possible 
(respecting precedence constraints between tasks of a same job and preventing any 
machine to be allocated to two tasks simultaneously). Then, starting time (s.^) of each 
task of each job may be computed in a recursive manner, as follows, starting with the 
first job. 



S,=^ 



0 

+ Pi(i-1) 

S,- + p,- 

max (s^^j.,) + p 



if J; is the first job of the sequence and j =1. 
if J, is the first job of the sequence and j 1. 
if Jj is not the first job of the sequence and j = 1 
Si-j + Pi-j ) otherwise. 



Where /’ represents the job that immediately precedes job i in the sequence. This 
formula expresses the fact that a task tj- cannot be planned unless: the machine m. has 
finished to process the previous task t,. and the previous task of the same job is 
over. 

Applying a GA method to a given problem requires also to define the genetic 
operators. The mutation operator consists in choosing randomly two points of the 
chromosome, inserting the last job before point 2, just after point 1 and shifting to the 
right jobs scheduled between the two points. The crossover operator, also called two 
points crossover, consists in generating one offspring from 2 parents [10]. Two points 
on Parent 1 are randomly chosen, defining two extremities that will constitute 
extremities of the Offspring. Then jobs that are not already selected in these two 
extremities, are selected in the order they appear in Parent 2, to fill the rest of the 
offspring. 



3.1 Selection Operators 

In this study, we have implemented 6 multicriteria selection strategies. The main 
differences between those methods consist in the way individuals of the population 
are ranked and the selection probabilities are calculated. 
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3.1.1 Selection by Weighted Sum of Objectives 

It was one of the first methods used for the multicriteria optimization (used in [10] for 
example). Based on the transformation of the problem to a single criterion problem, 
this method consists in combining the different objective functions in one single 
function, generally in a linear manner. 

f(S)= IKMS) 

ke [1..2] 

The weights are taken in the interval [0..1] such as X \ =1 (k=l..n). An individual 
S. has then a probability to be selected equal to : 

n(S)=f(S) / X/(5) where tp : population size 

j €[l..(p] 



3.1.2 Parallel Selection 

This selection approach has been used in the VEGA algorithm. Half of the selected 
individuals are selected with regard to their makespan. The remaining tp/2 individuals 
are selected with regard to their tardiness. 

3.1.3 NSGA Selection 

In the NSGA selection [14] (Non-dominated Sorted Genetic Algorithm), the ranks of 
individuals are calculated in a recursive manner, beginning with the non-dominated 
individuals of the population. A rank equal to 1 is associated to the non-dominated set 
of individuals of the current population. Rank k is associated to the set of 
individuals E^, dominated only by individuals belonging to U ^t-r 



The selection probability of an individual S. of rank n in the population follows 
Baker expression [1][16]: 






S(tp + 1-RJ+R. -2 

tp(tp - 1) 



Where S represents the selection pressure and 



R=l+|£j \E, 



3.1.4 NDS Selection 

In the NDS selection (Non-Dominated Sorting), the rank of an individual is equal to 
the number of solutions dominating this individual plus one [4] . 

Rank (S) = I 5- G Population / dominates -H 1 

The selection probability is calculated by the same formula as for NSGA selection. 
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3.1.5 WAR Selection 

The weighted Average Ranking consists in calculating the rank of each individual of 
the population with regard to the different objectives separately [2]. The rank of an 
individual is computed as the sum of ranks for each criterion. For this purpose, we 
order the individuals by an increasing order of/7 and/2. 

Rank{S) = Rank_Makespan(S) + Rank_Tardiness{S) 

The selection probabilities are calculated as for NSGA selection. 

3.1.6 Elitist Selection 

The elitist selection consists in maintaining an archive population PO* that will 
contain the best non-dominated solutions encountered during all the stages of the 
search (fig. 1.). This population will participate to the selection and reproduction 
stages [14]. In this case, the selection probability of an individual S. of rank « 
(calculated with the NSGA technique) of the current population will correspond to 
the following expression: 

n(S) = S(tp + 1-RJ +R^-2 

tp tp (tp - 1 ) 

The selection leaves us a probability of A/tp to choose an individual from the Pareto 
population. Flence, the parameter “A” determines the expectation of the number of 
individuals selected from PO* set. Actually, all the selection methods keep an archive 
containing the best solutions encountered during the search (Pareto set). The 
particularity of the elitism is to let this population participate during the selection 
phase. 




Fig. 1. Elitist selection strategy 
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To compare the performances of the 6 selection strategies implemented (fig. 2.), 
evaluations are carried out on the Heller problem, 20 jobs * 10 machines [9]. The 
obtained results show a considerable improvement of the search with the introduction 
of the elitism in the selection phase. The non-Pareto strategies represented here by the 
Weighted Sum Selection and the Parallel Selection seem to be non-adapted to the 
problem. The three selection strategies NSGA, NDS and WAR have similar 
performances, with a slight supremacy of the NSGA and NDS methods. The 
parameters of the different methods are described in table 1, the population size is 200 
and the number of generations is 15000. 



Table 1. Experimental parameters of the different selection strategies 



Weighted sum 


Xl=0.5 


12 = 0.5 


NSGA 


S = 1.7 




NDS 


S = 1.7 




WAR 


S = 1.7 




Elitist 


S = 1.7 


A = 5 




Fig. 2. Comparison of the different selection strategies 



Effect of the Elitism Pressure 

As the contribution of the elitism has been proven, it is interesting to know the impact 
of the parameter “A” (elitism pressure) on the performances of the GA. The elitism 
pressure value influences considerably the exploitation/exploration balance. 

The figures 3 and 4 show the search evolution for different values of the elitism 
pressure along different generations. For low elitism pressure (A=l), the Pareto 
frontier convergence is slow, contrarly to the results obtained for high elitism pressure 
(A=4). A high elitism pressure intensifies the exploitation tendency of the good 
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solutions found by the search. A low elitism pressure favors the exploration of new 
regions in the search space. The choice of an adapted value is determinant for the 
search efficiency 




♦ Gen : 1000 □ Gen : 3000 ^ Gen : 5000 
X Gen : 7000 O Gen : 9000 



Fig. 3. Search evolution for A = 1 




♦ Gen : 1000 □ Gen : 3000 ^ Gen : 5000 

X Gen : 7000 - O Gen : 9000 



Fig. 4. Search evolution for A = 4 



3.2 The Diversity Maintaining 



Classical GAs are reputed to be very sensitive to the initial population choice and to 
the bad sampling during the selection. This weakness is observable on the diversity 
loss, or what’s also called the genetic drift. To face this drawback, several approaches 
to maintain the diversity in the population were proposed: introduction of new 
random individuals; stochastic universal sampling (SUS) [1], distance maintaining 
[16], crowding [16], neighborhood restriction [5]. We are particularly interested in the 
implementation of the diversification approach based on sharing. 

The sharing principle consists in the degradation of the fitness of individuals 
belonging to search space regions with a high concentration of solutions. This process 
has the effect to favor the solutions dispersion in the search space. The degradation of 
the fitness of an individual is realized thanks to a function called sharing function sh. 
The new objective function of an individual x noted f’(x) is equal to the original 
function /divided by the sharing counter m(x) (niching counter) of the individual. The 
sharing counter is an estimate of how crowded is the neighborhood (niche) of 
individual x: 



m(x)=^ 

pop 



sh(dist(x,y)) 
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The sharing function Sh calculates the similarity degree of an individual with the 
remaining individuals of the population. The sharing function Sh is defined as 
follows: 



sh (dist (j: , y )) = i 



where the constant y designates the non-similarity threshold (niche size), i.e, the 
distance from which two individuals x and y are not considered as belonging to the 
same niche any more. The constant a allow to control and regulate the form of the 
function sh. Depending on whether the distance between two individuals is calculated 
in the decision space (the chromosomal representation of an individual) or in the 
objective space (fitness of individuals), three approaches have been used: 



r _ 



dist {x , y) 



a 



if distl (x, y) < y 



otherwise 



3.2.1 Genotypic Sharing 

In this approach the distance between individuals is calculated according to the 
difference between chromosomes (decision space). Since a schedule is represented by 
a permutation, the distance between two schedules is then equal to: 



distl(x, y) = I 7x /// precedes 7 in the solutions and j precedes / in y)| 



What means that the distance between two individuals x and y is equal to the number 
of the order ruptures between x and y. 



3.2.2 Phenotypic Sharing 

The distance in this approach is taken as the difference between individuals in the objective 
space. /7 and/2 designate the two objectives functions "makespan" and "total tardiness". 

dist2{x,y)=\fl{x)-fl{y}+\f2{x)-f2{y'l 



3.2.3 Combined Sharing 

This approach represents the combination of the two first approaches cited above. The 
computation of the distance refers to both, the genotype and the phenotype, distances. 
The function Sh, in this case, takes the following form: 
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sh{x,y)=\ 



1 - 

1 - 

1 - 

0 



distl{x,y) 

yi 

distl{x,y) 
disfi^ix,y yn{x, y ) 



if distl{x,y)<y\, dist2(x,y)>}2 

if distl{^,y)>y\, dist2(x,y)<j2 

if distl(x,y)<yi, dist2(x,y)<}2 
otherwise 



Where distl and distl represent the genotype and phenotype distances respectively. 




♦ Without Sharing □ Genotypic 



O Phenotypic sharing X Composed Sharing 



Fig. 5. Diversification contribution 

To show the contribution of the different diversification methods, many tests were 
executed on the Heller problem 20*10, with a limit number of generations equal to 
50 000 and using the same parameters as those described before for the elitist 
selection strategy. The parameters concerning the diversification strategies are : 
a=0.9, Y=4 for the genotypic sharing and Y=1.0 for the phenotypic sharing. 

Figure 5 reports results. We notice that the contribution of the phenotypic 
diversification is slightly appreciable compared to the results obtained by the 
phenotypic diversification. However, the genotypic diversification is distinguished by 
an effective set of solutions not found by the phenotypic diversification. The 
composed diversification presents a best Pareto front quality than the two previous 
methods. Let us note that the diversification contribution appears only after a 
considerable number of generations, and hence requires to take a very high generation 
limit number (>40 000). 
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4 Hybridization with Local Search 

We are interested in the use of the Local Search (LS) as a mean of acceleration and 
refinement of the Genetic Search. In this case, the idea is to run the GA first in order 
to approach the Pareto solutions. The hybridization principle is very simple, once the 
GA is over (the limit generation number is reached), the Local Search is then run with 
the previously obtained Pareto set as an entry (Figure 6). 

The use of the Local search needs, first, to choose the manner to generate the 
neighborhood of a given solution. To construct a neighbor of a solution, a job is 
chosen, removed and put somewhere else in the sequence. Other jobs situated 
between the two positions have to be shifted from one place to reconstruct the 
sequence. 

The hybridization process consists in generating, for each individual of the Pareto 
population, the neighborhood with the above described process. The non-dominated 
neighbors, are inserted in the Pareto population. The solutions belonging to the Pareto 
population and dominated by the neighborhood of one solution are suppressed. This 
process is reiterated until no new non dominated neighbor is found. Figure 6 describes 
schematically this Local Search process. 




Fig. 6. Local Search in action Fig. 7. Local search contribution 

The performance tests of the hybrid GA with the local search shows that hybridization 
is of no interest for little size problems, notably Heller 20*10. However, the 
hybridization is able to improve solutions as soon as the size of the problem increases. 
The tests presented in figure 7 are realized on Heller problem with 100 jobs and 10 
machines. 
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5 A Parallel Model and Its Implementation 

We have adopted a parallel model which is based on the well known distributed 
island model of GAs. This approach is based on the subdivision of the population into 
sub-populations of equal size. Each processor executes the GA on the sub-population 
assigned to it. With a certain period in terms of the number of generations, the 
different GAs exchange some local Pareto individuals (migration model). We have 
implemented a ring communication topology in order to minimize the communication 
cost and also maintain the connected aspect of the graph, what guarantees that a good 
individual may spread to all sub-populations after a certain number of migrations. The 
algorithm has been implemented under the parallel programming environment 
C/PVM (Parallel Virtual Machine). 

The evaluations are carried out on a cluster of workstations (Sun Ultral). The speedup 
obtained with 2, 3, 4, 6, 9 and 12 machines are almost linear. This is due to the fact 
that the communication costs between machines are negligible compared to the 
computing costs. The parallel algorithm favor the increase of the population size as 
well as the number of generations of the hybrid metaheuristic in order to generate 
better Pareto fronts. 



Table 3. Improvement of the quality of the solutions for (1): ma_tal l_bi and (2): ma_ta21_bi 



Pb. 


UB 


Sequential GA with tp=200 


Parallel GA with tp=300 


MM 


Dev 

% 


MR 


|PO| 


Nb gen 


MM 


Dev 

% 


MR 


|PO| 


Nh gen 


(1) 


1582 


1586 


0.25 


1508 


28 


80000 


1583 


0.06 


1431 


32 


300000 


(2) 


2297 


2330 
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0.34 
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29 
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UB : best known mekespan, MM: best obtained makespan, MR: minimal tardiness, 
PO: Pareto optimal set. 
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As shown in figures 10 and 11, the use of large populations distributed on different 
GAs and the increase of the number of generations improve the quality of the Pareto 
frontiers. 



6 Conclusion and Perspectives 

In this work, we have tried to construct our approach by a progressive introduction of 
concepts such as selection, diversity maintaining, hybridization and parallelization. At 
each stage we have shown the contribution of the introduced mechanisms, that’s what 
allows us to formulate the following conclusions. 

Pareto selection strategies (NSGA, NDS, WAR) seem to be well adapted to the 
multicriteria flow shop problem. The efficiency of such methods is improved with the 
introduction of elitism during the selection phase. The elitism may lead to premature 
search convergence though and then, parameters A (elitist pressure) and Q (selection 
pressure) have to be adequately chosen. However, the risk of a genetic drift is always 
present. The diversification strategies seem to be the privileged mean to prevent such 
problems. Three variants of the method, based on the sharing, were developed. The 
phenotypic sharing (diversification in the objective space) appears to be the most 
interesting. This interest is related to the fact that a larger and better dispersed Pareto 
frontier is desirable. However, the genotypic diversification may yield good results 
also. The combination of both concepts improves considerably the search. 

Combining GAs with local search was used in order to refine the search. The idea is 
to run the GA first in order to get a first approximation of the Pareto frontier. The 
local search has the merit to improve the solutions (find the local optima of the search 
regions). The contribution of hybridization appears for problem of important size 
only. 

The work presented here deals with a bi-criteria problem. However, the method 
proposed is able to be extended very easily to multi-criteria problems. Therefore, we 
now need to test the strength of the method for flow-shop problems with more than 
two criteria. A graphic comparison is impossible in this case, and more elaborated 
tests of performances methods must be used, such as the contribution notion [16] and 
entropy [16]. The extension of the method to the general flow-shop problem (not only 
permutation flow-shop) and to the job shop problems is also to be studied. Other 
hybridization schemes must be evaluated. Indeed, combining GAs with more 
advanced local search techniques such as tabu search may give better results. 

The proposed hybrid metaheuristic is still very slow due to the advanced 
mechanisms introduced. A parallel model has been proposed and implemented for the 
algorithm to overcome this drawback. The obtained speedup favor the use of large 
populations and the increase of the number of generations, which improves the quality 
of the obtained Pareto frontier. 
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Abstract. Population heuristics present native abilities for solving op- 
timization problems with multiple objectives. The convergence to the ef- 
ficient frontier is improved when the population contains ‘a good genetic 
information’. In the context of combinatorial optimization problems with 
two objectives, the supported solutions are used to elaborate such infor- 
mation, defining a resolution principle in two phases. First the supported 
efficient solution set, or an approximation, is computed. Second this in- 
formation is used to improve the performance of a population heuristic 
during the generation of the efficient frontier. This principle has been 
experimented on two classes of problems : the 1 | | {SC'i ,Tmax) permu- 
tation scheduling problems, and the biobjective 0-1 knapsack problems. 
The motivations of this principle are developed. The numerical experi- 
ments are reported and discussed. 



1 Introduction 

For some combinatorial optimization problems with two objectives, a (sub)set of 
exact efficient solutions or bound sets on the efficient frontier can be computed 
0. The supported efficient solutions deserve a lot of attentions for two reasons. 
First, these solutions are characterized and can be generated using a convex 
combination of the objectives |Z]. Second, the available experimental results H21 
cni show the number of supported efficient solutions which grows with the size 
of the problem, but still smaller than the number of non-supported efficient 
solutions. These observations conduct us to experiment the following solving 
principle; first, generation of supported solutions using an exact method or a 
heuristic method and second, usage of these solutions to improve the generation 
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of the whole efficient frontier using a population heuristic -PH-. In this con- 
text, supported solutions are considered as a kind of ‘good genetic information’ 
available and used by the population heuristic. They allow a fast convergence to 
the efficient frontier and maintain the diversity among the efficient frontier. We 
experimented this principle in different ways on two biobjective combinatorial 
problems. This paper focuses on the experimental results obtained (a complete 
description of the population heuristic used is available in mini). Section 2 
introduces usual definitions, the two classes of problem and gives a short presen- 
tation of our population heuristic. Section 3 reports and analyses the numerical 
experiments. A conclusion with a discussion is given in section 4. 



2 Background 

2.1 The Multiple Objective Combinatorial Optimization Problems 

Given a finite set X and several objective functions : X ^ M, j = 1 ... P , a, 
multiobjective combinatorial optimization (MOCO) problem is defined as P] : 

“min”(zi(S'),...,^-P(S')) (MOCO) 

S^X 

An element S' G X is a feasible decision and X is called the decision space. An 
vector z{S) = (z^{S), . . . , z^ (S)), z{S) G Z is a performance and Z is called 
the objective space. Typically two types of objective functions are considered, 
namely the sum and the bottleneck objective. The problem is then to solve 
(MOCO) where the meaning of “min” has still to be defined. Since objectives 
cannot be minimized simultaneously in general, most often the minimization in 
(MOCO) is understood in the sense of efficiency (or Pareto optimality). 

Definition 1: A solution S G X is called efficient if there does not exist 
another feasible solution S' G X such that z^(S') < z^(S) for all j = 1 . . .P 
with strict inequality for at least one of the objectives. 

Definition 2: The corresponding vector z{S) is called non-dominated. The set 
of Pareto optimal (efficient) solutions of (MOCO) will be denoted by E, the set 
of non-dominated vectors by ND in the sequel. 

Definition 3: The efficient frontier is the lower left part of the shortest curve 
that connects all ND vectors. 

2.2 Resolution Difficulties 

In the worst case, three factors are united and contribute to the (MOCO) reso- 
lution difficulties. 
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(A) Computational Complexity 

A (MOCO) is A/”7^-hard if the single objective problem is AfT^-hard |3|. For 
instance, the knapsack problem is one of the fundamental AfT^-complete 
combinatorial optimization problems. Thus the biobjective 0-1 knapsack 
problem is AfT^-hard as is the single objective case. A similar result exists 
for scheduling problems. As soon as one objective is recognized as AfP-hard 
to be optimized, then the multicriteria problem is also NV-haid 0. Thus 
a single machine scheduling problem with 2 objectives is difficult to solve 
with an exact method in reasonable computational time in practice when 
one objective is A/”7^-hard to be optimized. 

(B) Solutions sets 

In the convex case (the P functions are convex and A is a convex set), E 
coincides with the set SE of supported efficient solutions. Given the weights 
A-', j = a supported efficient solution is the optimal solution of 

a single objective problem (P\) corresponding to the maximization of the 
weighted sum of the objective functions : 

max 

Due to the discrete structure of the (MOCO) problem, the feasible domain 
X is generally non-convex (see Figure 0), so that E = SE[jNE where NE 
denotes the set of non-supported efficient solutions. 



I X G 
i=i 



7 / , • 

i=i 



(Px) 



(C) Number of solutions 

The numerical results available on the biobjective knapsack problem m 
show that the number of supported solutions grows linearly with the problem 
size, but the number of non-supported solutions grows following an exponen- 
tial function. We noticed the same observation (see Figure 0) for a class of 
biobjective permutation scheduling problems H21 Other observations men- 
tion that the number of efficient solutions grows also with the number of 
objectives. 

To summarize, (MOCO) is in the worst case an AfP-hard problem, with a 
non convex feasible domain X and a huge number of solutions for large multiple 
objective instances! In this context, an approximation approach is a reasonable 
alternative to exact methods. A solving procedure based on a (meta)heuristic 
is able to cope with the factors (A) and (B) discussed before. It yields a good 
tradeoff between the quality of an approximation E = SE U NE of the exact 
efficient solution set E, and the time and memory requirements. 



2.3 The 1 I I {SCi , Tmax) Permutation Scheduling Problems 

We consider a set of n independent jobs to be scheduled on a single machine, 
which can handle no more than one job at a time. The machine is assumed to 



432 



X. Gandibleiix, H. Morita, and N. Katoh 



650 
600 
550 
500 

450 
400 
350 
300 

325 375 425 475 525 575 625 675 

Fig. 1. Supported (square) and non-supported (circle) efficient solutions 
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Fig. 2. Input size and numbers of efficient solutions 
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be available at time 0. Job Ji{i = 1, . . . ,n) requires processing during a given 
non-preemptive time pf, to each job is assigned a due date di, at which Ji should 
be finished. Moreover if a relative importance between jobs has to be considered, 
a weight Wi is assigned to each job. It is assumed that all values pi, di and Wi 
are positive integers. A schedule a defines for each job Ji a completion time 
C'i(cr) such that technical constraints (capacity and availability) of the machine 
are respected. We consider that the quality of a schedule is measured in term of 
two objectives : a ^ IN, j = 1,2, both are assumed to be minimized, and 
z^ are two regular performance criteria defined on a : 

- the total fiow time : ECi defined as Aj=i^„C'i(cr) 

- the maximum tardiness : Tmax defined as maxi<i<„ Ti 

where tardiness Ti of job i is defined to be max{Ci((T) — di, 0}. 

According to the classical notation of ca, this class of biobjective single- 
machine scheduling problems is denoted by 1 | | SCi,Tmax (biobjective single 
machine permutation scheduling problems minimizing both total fiow time and 
maximum tardiness simultaneously). Since we are considering regular perfor- 
mance measures, a schedule is determined by a permutation a over {1, 2, . . . , n}. 
Given a schedule a, completion time Ci{a) of job i is uniquely determined and 
is denoted by Ci for simplicity in the sequel. 

This problem has received a lot of attention. Van Wassenhove and Gelders 
na have proposed a pseudo-polynomial algorithm for generating the exact set 
of efficient solutions. This class of problems has been chosen in this paper, not 
to put in competition a heuristic method with an exact method but, to give an 
experimental feedback easily. 

2.4 The Biobjective Knapsack Problems 

We consider the 0-1 MultiObjective Knapsack (O-IMOKP) where coefficients 
cl, Wi and oj are nonnegative constants. Since all coefficients are nonnegative, 
ii Wi > Lu or = 0, j = 1 ... P then the variables Xi can be fixed at 0 in any 
efficient solution x £ E, while if Wi < E = {e} where e = (1,1,..., 1)* 
is the vector of all I’s. The biobjective case (P = 2) is denoted by (BiKP). 



N 



max” z^{x) = ^ N^Xi j = 1 .. .P 



i=l 

N 

S.t. 




(O-IMOKP) 



XiSjO, 1} i=l...N 



With one or multiple objectives, the 0-1 knapsack problem has received quite 
some consideration by researchers in the last two decades. All the papers con- 
cerning the resolution of the (O-IMOKP) deal with the problem of identifying 
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or approximating E. The proposed algorithms are either based on implicit enu- 
meration methods such as dynamic programming, branch and bound, or apply 
heuristic procedures, especially metaheuristics like Simulated Annealing, Tabu 
Search, Genetic Algorithms, and others to approximate E (see ^). 

2.5 Overview of Our Population Heuristic 

A genetic algorithm is a population heuristic built on iterative solution tech- 
niques that handle a population of individuals and make them evolve according 
to some specified rules. Self-adaptation and co-operation are the fundamental 
mechanisms applied on the individuals at each iteration |n|. Genetic algorithms 
have been recognized to be well suited to multiobjective optimization in nature 
since they are keeping multiple solutions in parallel. 

Glassical GA did not incorporate any problem specific knowledge. However 
incorporating knowledge improves the computational effectiveness of the heuris- 
tic. The term hybrid genetic algorithms was introduced to distinguish from a 
classical GA. An equivalent term is memetic algorithm. A hybrid GA is also 
sometimes called a genetic local search algorithm P . 

GA proceeds in general by keeping a set of solutions (the population), and by 
performing crossover, mutation, and selection operations |^. The major differ- 
ence between single-objective and multiobjective GA’s lies in the way of selecting 
individuals for the next generation. In a single-objective GA an individual can 
be evaluated according to the single objective function. In a multiobjective GA 
it is desired to obtain a set of individuals uniformly distributed over the efficient 
frontier and well approximates the set of non-dominated vectors. 

Gonsidering the results presented in m the main strategies of our popula- 
tion heuristic (called MGK for Morita/Gandibleux/Katoh) are described briefly 
(a complete description of the population heuristic used is given in nnni): 

a) All solutions ranked one by Goldberg’s method jH| are kept for the next 
generation, because computing them again is expensive in GPU time. 

b) (Schaffer’s strategy) When selection is performed, a few good solutions in 
the current population with respect to each objective are kept for the next 
generation even if their ranks are more than one. 

c) Among the solutions not selected in (a) and (b), solutions kept for the next 
generation are determined by tournament selection based on domination 
relationship with sharing. We use this strategy hoping to keep the diversity 
of individuals. 

d) (Seeding strategy) In addition to the set of randomly generated individuals, 
we initially add a few solutions to the current population that are good with 
respect to each objective. An exact or heuristic algorithm depending on 
the computational difficulty of the corresponding single-objective scheduling 
problem computes such good solutions. Using this strategy, we can obtain 
good solutions more quickly. 

e) To improve the convergence to non-dominated vectors, a local search method 
is incorporated into our algorithm. At any generation, the local search 
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method is applied to all elite individuals except for which local search has 
been already applied in the previous generations. In this local search method, 
all of the neighborhood solutions which is not dominated by a tentative ap- 
proximate non-dominated solution are kept. So it is possible to keep the 
numbers of approximate non-dominated solutions at a time. 

3 Numerical Experiments 

In these experiments, a standard PC (Celeron 300A) is used and compilation is 
achieved with gcc (ver.2.5.8). All the numerical instances are available on our 
web site ini- We report several information, especially the detection ratio e.g. 
the number of efficient solutions that MGK can find in E. The seeding solutions 
used in the sequel come from the supported efficient solution set. 

3.1 The 1 I I {SCi , Tmax) Permutation Scheduling Problems 

We have randomly generated 10 problem instances for each of five different 
numbers of jobs, i.e., n = 100,200,300,400,500. Exact efficient solutions are 
computed using the algorithm of HS|. To achieve our analysis, the MGK has 
been ‘forced’ to detect all efficient solutions. For all instances, our population 
heuristic is able to find all efficient solutions, often in reasonable computational 
time up to 500 jobs. Nevertheless, the generation of E seems more difficult for 
the sets Dat3 and Dat4. An explanation could be due to the large number of 
efficient solutions, especially the non-supported solutions, to generate. 

Resolutiou with Two Seeding Solutions. For 1 | | ECi,Tmax, we can com- 
pute an exact solution for each objective m- 1 I I ECi and 1 | | T^ax can be 
optimally solved by the Smith’s rule and by the FDD rule, respectively. So these 
solutions are incorporated in the initial solution set. Tab. ^illustrates the results 
for all instances using two seeding solutions. Fig. 0and Fig. 0| illustrate individ- 
uals after 50 generations and 500 generations respectively. As figures show, the 
algorithm can spread individuals to entire efficient frontier, because seeding solu- 
tions can propagate their superior genetic information to other individuals with 
a weak number of generations. Fig. 0 makes a zoom in dotted square plotted in 
Fig. 0 The individuals come near the efficient frontier, and most of individuals 
are efficient solutions. 

Resolution using whole SE as Seeding Solutions. In computing the convex 
hull for E, the subsets SE and NE are identified. We experiment now the case 
when whole SE is used as seeding solutions. Tab. Qillustrates the results for all 
instances when all SE solutions are used in the initial population. It shows that 
the good genetic information of SE solutions have much impacts for finding all 
efficient solutions. We notice that number of generations and computational time 
become about half of them using two seeding solutions. From these results, we 
can see that it is very effective to use these genetic information of SE solutions 
as much as possible. 
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Total flow time 

Fig. 3. With two seeding solutions after 50 generations (300jobs/dat3) 




Total flow time 



Fig. 4. With two seeding solutions after 500 generations (300jobs/dat3) 
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Table 1. Computational results for all problem instances using two seeding solutions 





100 jobs 


200 jobs 


300 jobs 


400 jobs 


500 jobs 


number 
of gen- 
eration 


CPUt 

(sec.) 


number 
of gen- 
eration 


CPUt 

(sec.) 


number 
of gen- 
eration 


CPUt 

(sec.) 


number 
of gen- 
eration 


CPUt 

(sec.) 


number 
of gen- 
eration 


CPUt 

(sec.) 


DatO 


33 


0.62 


32 


9.7 


40 


35.4 


74 


106.5 


42 


174.6 


Datl 


24 


0.72 


70 


11.4 


134 


168.3 


312 


1236.2 


611 


493.6 


Dat2 


80 


4.29 


145 


84.7 


344 


926.6 


624 


2589.8 


539 


9052.1 


Dat3 


110 


4.91 


621 


536.8 


694 


2087.47 


1901 


25003.3 


3054 


75318.5 


Dat4 


140 


16.65 


742 


640.9 


1419 


5708.2 


1791 


29341.4 


2876 


102414.4 


Dat5 


10 


0.11 


15 


1.9 


13 


4.5 


11 


7.4 


14 


24.4 


Dat6 


10 


0.02 


22 


2.2 


35 


14.3 


61 


85.6 


34 


77.5 


Dat7 


20 


0.07 


44 


11.0 


38 


24.5 


206 


580.6 


135 


865.3 


DatS 


20 


0.2 


59 


21.1 


68 


85.6 


245 


992.0 


692 


4871.4 


Dat9 


31 


1.04 


37 


25.5 


183 


339.2 


419 


1266.4 


699 


6000.6 


Avg 


47.8 


2.86 


178.7 


134.5 


296.8 


939.4 


564.4 


6120.9 


869.6 


20370.2 


Min 


10 


0.02 


15 


1.9 


13 


4.5 


11 


7.4 


14 


24.4 


Max 


140 


16.65 


742 


640.9 


1419 


5708.2 


1901 


29341.4 


3054 


102414.4 



Table 2. Computational results for all problem instances using all SE 





100 jobs 


200 jobs 


300 jobs 


400 jobs 


500 jobs 


number 
of gen- 
eration 


CPUt 

(sec.) 


number 
of gen- 
eration 


CPUt 

(sec.) 


number 
of gen- 
eration 


CPUt 

(sec.) 


number 
of gen- 
eration 


CPUt 

(sec.) 


number 
of gen- 
eration 


CPUt 

(sec.) 


DatO 


11 


0.4 


14 


4.1 


14 


20.2 


23 


44.0 


15 


154.5 


Datl 


10 


0.1 


12 


5.3 


29 


91.5 


112 


490.4 


522 


2920.9 


Dat2 


25 


2.2 


55 


49.0 


144 


372.4 


371 


1393.5 


722 


6225.2 


Dat3 


27 


2.7 


196 


153.6 


470 


1236.7 


1072 


7915.1 


1216 


27291.3 


Dat4 


51 


8.4 


220 


310.3 


1485 


3946.3 


770 


16020.1 


1150 


66989.5 


Dat5 


11 


0.1 


12 


1.2 


10 


0.1 


10 


0.25 


10 


0.4 


Dat6 


10 


0.03 


10 


0.1 


11 


11.1 


14 


56.7 


12 


69.7 


Dat7 


10 


0.04 


16 


4.9 


12 


14.5 


46 


315.0 


36 


420.3 


Dat8 


11 


0.6 


34 


18.7 


25 


42.0 


120 


459.3 


407 


2585.6 


DatO 


12 


0.9 


55 


26.6 


59 


154.9 


211 


760.7 


327 


3336.5 


Avg 


17.8 


1.5 


62.4 


57.4 


225.9 


589.0 


274.9 


2745.55 


441.7 


10999.4 


Min 


10 


0.03 


10 


0.1 


10 


0.1 


10 


0.3 


10 


0.4 


Max 


51 


8.4 


220 


310.3 


1485 


3946.3 


1072 


16020.1 


1216 


66989.5 



3.2 The Biobjective Knapsack Problems (|0) 

We use 10 problem instances randomly generated. The entries oj of right-hand 
side of the constraint < w is correlated with vector w as follows : 

Lu = 0.5 X For these problems the efficient set E was available and was 

obtained by the Branch & Bound based procedure of Ulungu and Teghem ^ . 



Resolution using whole SE as Seeding Solutions. In our experiments, 
the exact supported efficient solutions are computed in solving weighted sum 
problems using the algorithm of Martello and Toth m in a dichotomic procedure 
(see Algorithm 1). 

So these solutions are incorporated in the initial population. Tab. 0 summa- 
rizes the results for all instances. The detection ratio becomes very good with 
a small number of generation. 75 % of exact solutions are detected after only 
1500 iterations. Moreover the quality of the approximation is also very good 



438 



X. Gandibleiix, H. Morita, and N. Katoh 



— Efficient frontier 
‘ 500 generations 




465500 466000 466500 467000 467500 468000 468500 

Total flow time 



Fig. 5. Zoom over the efficient frontier approximation after 500 generations 
(300jobs/dat3) 



(see Figure 0, with of course all the potential solutions generated being inside 
of triangles defined by two adjacent supported solutions. 



KP300-B : zoom 




------ SE(P) 

— — — triangle 
O E(P) 

+ E(P) 



Fig. 6. A zoom on the potential efficient solutions obtained at the end of the generation 
when SE is using in the initial population (instance KP300/series B of |12|) 
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Algorithm 1 

firstPhase: procedure () 

--| Compute and the lexicographically optimal solutions for 
--| permutations (2:^,2^) and (z^,z^) of the objectives. 

<— SolveLexicographic(zl J, , z2 1) 
x^"^^ <— SolveLexicographic(z2 J, , 2:1 |) 

S ^ 

-~\ Compute all supported solutions between and x^'^\ 

-~\ Update S with all new solutions generated. 

solveRecursion(x^^^ [ , i , •S' |) 
end firstPhase 

solveRecursion: procedure ( i , i , 5 I) 

--| Compute the optimal solutions x'^^'> of (Pa): 

--| min{Aiz^(a:) + \2Z^{x) \ x G X}, where 

-I Ai = ^ 2 (^(^)) _ and A2 = z^x^^'>) - z^x^^'^). 

^ SolvePA(A J,) 
if ^ S then 
S' 4 - S'U{x(C)} 

solveRecursion(x^'^^ i , x^^'^ i , •S' |) 
solveRecursion(x^*^^ I , x^^'^ i , S' |) 

end if 

end solveRecursion 



Table 3. Detection ratio and CPU time for each problem instance using all SE 



Number of 
generations 


1 Number of items | 


100 


150 


200 


250 


300 


350 


400 


450 


500 


100 


79.65 


78.69 


63.55 


65.18 


57.08 


52.70 


53.7 


47.59 


51.69 


200 


82.56 


85.25 


67.20 


75.04 


63.53 


61.19 


61.2 


56.90 


59.13 


300 


87.79 


86.89 


71.75 


78.22 


66.06 


65.33 


64.7 


60.41 


61.82 


400 


91.28 


90.57 


72.89 


80.45 


69.57 


67.62 


67.1 


63.24 


63.82 


500 


93.60 


91.39 


76.54 


81.88 


71.67 


70.38 


69.7 


65.24 


66.16 


600 


94.19 


91.39 


77.90 


82.99 


74.61 


72.10 


70.4 


67.38 


67.68 


700 


94.19 


92.62 


80.87 


84.10 


77.00 


73.25 


72.4 


68.55 


69.26 


800 


94.19 


92.62 


81.09 


84.90 


77.98 


74.17 


73.9 


69.45 


70.37 


900 


94.19 


92.62 


82.46 


85.21 


79.24 


75.89 


75.0 


70.76 


71.33 


1000 


94.19 


93.03 


83.60 


85.53 


80.08 


76.69 


75.6 


71.79 


72.71 


2000 


98.26 


94.67 


87.70 


88.71 


85.55 


83.58 


81.3 


78.41 


80.01 


3000 


98.84 


97.13 


89.98 


90.62 


89.06 


87.26 


85.5 


82.69 


83.25 


4000 


100.00 


97.54 


92.26 


92.37 


90.60 


89.32 


87.0 


85.52 


86.22 


5000 




97.95 


93.62 


92.69 


92.15 


91.39 


88.5 


87.66 


87.66 


6000 




97.95 


94.31 


93.32 


92.71 


91.85 


89.9 


89.24 


88.7 


7000 




98.77 


94.76 


93.80 


94.11 


92.65 


90.1 


90.28 


89.8 


8000 




98.77 


94.99 


94.59 


94.39 


93.23 


90.6 


90.76 


91.18 


9000 




98.77 


95.22 


94.91 


94.53 


94.26 


91.2 


91.38 


92.14 


10000 




98.77 


95.44 


95.07 


94.81 


94.72 


91.7 


91.72 


92.63 


CPUt 


199.7 


982.7 


2686.4 


6208.6 


7428.3 


14129.9 


17291.9 


46046.7 


49508.1 
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Table 4. Detection ratio and CPU time for each problem instance using all SE 



Number of 
generations 


1 Number of items I 


100 


150 


200 


250 


300 


350 


400 


450 


500 


100 


6.40 


4.10 


0.23 


1.75 


0.14 


0.00 


0.0 


0.0 


0.0 


200 


71.51 


70.49 


3.64 


42.93 


14.45 


1.38 


7.2 


5.31 


0.21 


300 


78.49 


80.33 


39.86 


67.41 


38.15 


23.08 


41.4 


26.76 


18.33 


400 


80.23 


82.79 


65.38 


74.24 


54.28 


51.21 


60.4 


44.83 


40.45 


500 


81.98 


84.43 


69.48 


75.99 


67.46 


61.77 


64.4 


53.52 


48.66 


600 


82.56 


85.66 


73.58 


77.27 


72.37 


66.02 


69.6 


58.76 


54.65 


700 


83.14 


86.89 


76.77 


78.54 


75.46 


68.54 


72.2 


61.59 


57.06 


800 


83.72 


87.70 


79.04 


79.17 


76.72 


69.80 


73.5 


63.38 


59.27 


900 


86.05 


88.52 


79.50 


80.76 


77.70 


70.84 


74.9 


64.48 


60.72 


1000 


86.63 


88.52 


80.87 


81.40 


78.40 


71.53 


75.8 


65.86 


62.44 


2000 


91.28 


90.98 


87.24 


86.49 


85.13 


81.29 


83.3 


74.07 


73.47 


3000 


91.28 


93.85 


89.98 


89.03 


87.24 


84.96 


85.7 


78.21 


80.08 


4000 


91.28 


95.49 


91.34 


89.98 


88.50 


87.60 


87.9 


80.55 


82.91 


5000 


91.86 


96.72 


93.62 


89.98 


88.64 


89.78 


89.0 


83.10 


84.36 


6000 


94.19 


97.54 


94.99 


90.46 


89.62 


90.59 


89.9 


83.93 


85.67 


7000 


94.77 


98.36 


95.90 


90.78 


90.18 


91.39 


90.8 


85.52 


87.53 


8000 


95.93 


98.36 


96.13 


91.57 


90.88 


92.08 


91.0 


86.69 


88.7 


9000 


96.51 


98.36 


96.36 


92.05 


91.02 


92.42 


91.8 


87.93 


89.59 


10000 


96.51 


98.36 


96.36 


92.37 


91.30 


92.65 


92.4 


88.69 


90.35 


CPUt 


501.2 


1277.3 


2724.1 


7728.3 


8595.1 


17999.6 


25392.7 


42836.5 


46673.7 



Resolution using an Approximation of SE as Seeding Solutions. The 

exact supported solution set SE is replaced by an approximation SE obtained 
with the following greedy algorithm. For some values of A € [0,1], Ci(A) = 
\c\ + (1 — A)cf is computed and we fix at 1 the variables Xj, i = 1 .. .N according 
to a decreasing order of the Ci{X)/wi ratios, as long as WiXi < oj. 

Here, at least 75 % of exact solutions are detected after 2500 iterations. In 
focussing on the analysis over 3 numerical instances (100, 300 and 500 items), 
we notice the same behavior in terms of detection progression when SE or SE 
is used. The usage of SE only introduces a delay in the detection for a given 
number of generation. It is not a surprise because the genetic information is 
obviously better in the SE. 

4 Conclusion 

The resolution of a (MOCO) problem could be decomposed in two phases. First, 
the solving procedure tries to generate the supported efficient solutions using 
an exact method or approximation method. Afterwards, this information ob- 
tained is advantageously used as ‘a good genetic information’ to identify the 
non-supported solutions by a population heuristic. According to the numerical 
experiments, we proved in this paper the effectiveness of the population heuristic 
when the initial population contains information derived from SE solutions. 

Certainly, such a hybrid or ‘semi-exact’ method is especially attractive for 
problems that can be efficiently solved as single objective combinatorial prob- 
lems, like the assignment problem or when an efficient greedy algorithm is avail- 
able, like for the knapsack problem. 
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Fig. 7 . Influence of using SE or SE on the detection ratio 



This approach has natural qualities like, a fast convergence to the efficient 
frontier, a uniform distribution of the potential efficient solutions obtained along 
the whole efficient frontier, and a good quality due to them closeness with the 
exact solutions. 
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Abstract. This paper deals with the analysis of genetic operators for 
a multi-objective flow-shop problem. The analysis is based on the in- 
fluence of parental distance on the number of non-dominated solutions 
that are generated. The results of the analysis allow us to select the best 
combination of operators to deal with a specific problem. Simulation re- 
sults show that using our design approach we can easily improve specific 
results recently available in the literature. 



1 Introduction 

Real world optimization problems are usually multi-objective (MO) in nature. 
The lack of methodologies to tackle these problems makes them attractive the- 
oretically, and practically. 

The research community in economics has been the pioneer in the study of 
multi-criteria analysis and optimization. From this, Pareto’s work is the mile- 
stone. Even though continuous multi-objective problems has received a lot of 
attention, the discrete case (Integer Programming MO) has been devoted little 
attention. 

Among the discrete MO problems, scheduling seems to be one of the most 
challenging one. In a real scheduling problem we are interested not only in min- 
imizing the latest completion time (makespan) but also in minimizing the total 
time all jobs exceed their respective due dates. 

On the other hand, the available classical methodologies in genetic algorithms 
(GA’s) have been focused on function optimization rather than in combinatorial 
optimization problems (OOP’s). A few works on MO scheduling with a single- 
objective-like approaches show that there is much to do in this research area. 

To the best of our knowledge study of operators and its relation with dom- 
inance properties of solutions, for scheduling problems, have not been investi- 
gated. 

This paper is a first small step in trying to understand these relations. 

The remainder of the paper is organized as follows. Section 2 states the 
problem we are dealing with. Section 3 reviews some available results for this 
problem. Section 4 defines the distance measures we are going to use in our 
analysis procedure. Section 5 describes the algorithm we use. Section 6 shows 
the experimental results. Finally, section 7 presents the summary of this work. 



E. Zitzler et al. (Eds.): EMO 2001, LNCS 1993, pp. 44.3- Hn 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 
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2 Problem Statement 

The permutation flow-shop problem consists of a set J of n jobs that must be 
processed in a set of machines M. Each job j G J has m = |M| operations. 
Each operation Okj, representing the k-th operation of job j, has an associated 
processing time tkj- Each machine must finish the operation once it is started to 
be processed (no preemption allowed). No machine can process more than one 
operation at the same time. No operation can be processed by more than one 
machine at a time. Each job j is assigned a readiness time rj, and due date dj. 
All jobs must have the same routing through all machines. The goal is to And a 
permutation of jobs that minimizes a given objective function (since the order 
of machines is irrelevant). 

In order to understand the objective functions we want to optimize we need 
to set up some notation first. Let us denote the starting time of operation Oj-j 
by Skj, its completion time by Ckj- Define as the set {1, 2, • • • , m}. With this 
notation a feasible solution holds the following conditions: 

Skj > Tj yk G KmJ G J , (1) 

^kj ^kj ^ ^ ^ m— 1 5 7 ^ J • ( 2 ) 

All pairs of operations Okj and Ori processed on the same machine must satisfy: 



Skj + tkj ^ Sri Or 

Sri + tri < Skj for each machine in M, k ^ r or j ^ i . (3) 

Now we are in position of defining the objective functions. First we consider 
the makespan, which is the completion time of the latest job, i.e. 

fi = ma,x{smj + tmj} ■ (4) 

3 

The mean flow time, representing the average value of the time during which 
the jobs remain in the shop, is the second objective. 

n 

f2 = fl = {l/n)J2.fl3, ( 5 ) 

i=i 

where flj = ma,Xj{smj + tmj} — fj, i.e. the time job j spends in the shop after 
it is released. The third objective is the mean tardiness, i.e. 

n 

/3 = r=(l/n)5^T, , (6) 

where Tj = maxjO, Lj}, and Lj = Smj + tmj — dj. 
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Thus, we have the following MO problem: 



min(/i,/2,/3) 

subject to (1) — (3) . (7) 

3 GA’s Approach to MO Scheduling Problems 

There are many approaches for solving the general MO problem by using GA’s. 
Surveys on the exiting GA’s methodologies can be found in jH]], P, P, and 
references therein. Almost any application uses the methodologies described in 
these surveys. 

Since this is a new research area, there are still many fundamental questions 
to be answered. Specially, in the field of MO-GOP’s, everything is to be done. To 
date, one of the most pragmatic question to answer is how to fairly compare two 
given methodologies, or in the best case, how to judge any given methodology. 

The application of GA’s to MO scheduling problems has been rather scarce. 
Two interesting ideas are those presented in jSj, and p. 

In P the scheduling of identical parallel machines, considering as objective 
functions the maximum flow time among machines and a non-linear function 
of the tardiness and earliness of jobs, is presented. In P a natural extension 
of NSGA jini is presented and applied to flow-shop and job-shop scheduling 
problems. Another, totally different approach is that presented by Isibuchi and 
Murata p]. They use a local search strategy after the genetic operations without 
considering non-dominance properties of solutions. Their method is applied to 
the MO flow-shop problem. 

The main idea when solving MO scheduling problems is to apply the existing 
GA’s methodologies to the problem to solve. However, there are no traces of 
studies on how adequate these methodologies may be. Again, the lack of a fair 
methodology for comparing the results does not help to improve this situation. 

In order to design adequate genetic operators we need to know the properties 
of solutions and to understand the problem-algorithm landscape. The following 
questions are of much interest: 

1. Are close solutions in the objective function space close in the domain 
space? 

2. Are close Pareto optimal solutions (in the objective function space), 
close in the domain space? 

3. Does crossover of non-dominated solutions generate mostly non- 
dominated solutions? 

4. What type of crossover or mutations favours the creation of non- 
dominated solutions from non-dominated solutions? 

These questions, related to the problem-algorithm landscape have received 
very little attention, although they are of primary importance. 



446 C. Brizuela, N. Sannomiya, and Y. Zhao 



When we design move-operators to deal with neighbourhood construction for 
multi-objective optimization problems, there are also fundamental questions we 
need to answer in order to choose the right operator. In the generated neighbor- 
hood: 

5. Is there always at least one non-dominated neighbour? 

6. Is there a high percentage of non-dominated solutions among the 
neighbours? 

7. Is there any type of neighborhood that favours the generation of non- 
dominated solutions? at least one, (almost always) one, or many? 

There is no trace of research addressing these questions for MO problems. In 
the case of single objective scheduling problems such questions are answered in 
many works related to landscape study as well as neighbourhoods study (see for 
example |Z] and references therein). 

4 The Proposed Algorithm 

The algorithm we propose here is just the standard GA for MO problems as 
suggested in cni, with a minor modification. The contribution we try to make 
is in the analysis of genetic operators in order to choose the adequate set for a 
given problem. The proposed procedure is in its preliminary stage. Therefore, 
more questions than answers will be highlighted. 

The specific MOGA we use here as a framework is stated as follows. 
Algorithm 1. Multi-objective GA. 

Step 1. Set r = 0. Generate an initial population POP[r] of g individ- 
uals. 

Step 2. Glassify the individuals according to a non-dominance relation. 
Assign a dummy fitness to each individual. 

Step 3. Modify the dummy fitness by fitness sharing. 

Step 4. Set z=l. 

Step 5. Use RWS to select two individuals for crossover according to 
their dummy fitness. Perform crossover with probability pc- 
Step 6. Perform mutation of individual i with probability Pm- 
Step 7. Set i = i+l.lfi = g then go to Step 8 otherwise go to Step 5. 
Step 8. Set r = r -I- 1. Gonstruct the new generation POP[r] of g 
individuals. If r = r^ax then STOP; otherwise go to STEP 2. 

The procedures involved at each step of this algorithm are explained in the 
following subsections. 

4.1 Individual Representation and Decoding 

Each individual is represented by a string of integers representing job numbers 
to be scheduled. In this representation individual r looks like: 
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= (* 



(r) -(r) 



r=l,2,-' 



(r) 

where ij, € J. 

The schedule construction method for this individual is as follows: 

1) Enumerate all machines in M from 1 to m. 

2) Select the first job (4*^^) of and route it from the first machine 
(machine 1) to the last (machine m). 

3) Select iteratively the second, third, • • •, n-th job and route them 
through the machines in the same machine sequence adopted for the 
first job (machines 1 to m). This must be done without violating the 
restrictions imposed in (1) to (3). 



4.2 Genetic Operators 

The selection operator we use here is standard to GA’s, like those proposed 
elsewhere jS|. Two selection processes are distinguished here. 

Selection for mating (Step 5). This is the way we choose two individuals to 
undergo reproduction (crossover and mutation). In our algorithm the so called 
roulette wheel selection (RWS) is used. This selection procedure works based on 
the dummy fitness function assigned to each individual. The way to compute 
the dummy fitness (Step 2) and the way to do the fitness sharing (Step 3) are 
standard (see P). 

Selection after reproduction (Step 8). This is the way to choose individu- 
als to form the new generation from a set given by all parents and all offsprings. 
In this paper, the best elements are selected from the pool of parents and off- 
springs. 

To define “the best”, g individuals are sorted according to those belonging 
to the non-dominated front, among these, individuals with better makespans 
have higher priority followed by tardiness, and finally by the mean flow time. 
After sorting all individuals in this front they are erased from the population. 
The same procedure is applied to the remaining individuals, until we complete 
g sorted individuals. 

If there are repeated individuals (considering the objective functions), then 
these are erased (only one copy of each type, at each step of the sorting process, 
is left) and replaced by randomly selected individuals from the pool of parents 
and children that where not sorted. 

We need to explain now the crossover and mutation operators to be used. 
Three different types of crossover and mutation operators are considered. 

We start explaining the crossover operators (Step 5). 

OBX. This is the well known order-based crossover (see jH|) proposed by 
Syswerda. The position of some genes corresponding to one of the parents are 
preserved in the offspring. 

PPX. Precedence-based crossover. A subset of precedence relations of the 
parents genes are preserved in the offspring. 
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TPX. Two point crossover. This is a special case of OBX with the difference 
that two segments of one of the parents are always copied into the offspring. 

The mutation operators for the flow-shop problem can be considered as move 
operators in a neighborhood since, in average, the mutated solution is not far 
away from the original solution. The following mutation operators are used (Step 
6 ). 

SWAPl. A single swap of two adjacent genes is performed. The locus to 
swap is randomly selected. 

SWAP2. Two loci are randomly selected and their alleles interchanged. 

SWAPS. Two loci (/i,? 2 ) are randomly selected if li < I 2 then the allele 
corresponding to h is placed on I 2 and all genes from Zi -|- 1 to I 2 are shifted one 
position towards l\. If li > 1 2 then the opposite operation is performed. 

Before actually using any of these operators in Algorithm 1 we would like to 
know about their effects on the non-dominance and distance relations between 
the parents and the offsprings. It is also important to know the length of the 
jumps of each move (mutation) operator in order to understand which is the 
most appropriate for the problem to solve. 

To do this we start by defining the distance measure we are going to deal 
with. 



5 Distance Measures 

In the classical permutation flow-shop problem, the solution is totally defined by 
the sequence of jobs numbers. Therefore, the distance measure gives an idea of 
how different two such sequences are. To compute this difference, each sequence 
®=(ji) J2, ‘ ‘ ‘ ijn) is associated with an ram matrix for whose elements we define 
Qij (s)=l if job j is scheduled before job i, and zero otherwise. Thus, the difference 
between schedules sr and sk is given by 

n n 

d{sk,sr) (B aij{sk) (8) 

j=i i=i 

where © represents the exclusive-or logical operation. To normalize the distance 
(8) we divide it by the maximum number of different elements between two given 
associated matrices, i.e. 



dn{sk,sr) = d{sk,sr)/n{n — 1) . (9) 

We call this the domain distance since it uniquely represents the solution 
which is mapped into the objective function space. This type of distance measure 
definition can be found elsewhere 0. 

5.1 Objective Function Distance 

We define the objective function distance {ofd) between solutions sr and sk as 
the Euclidean distance of their mappings, i.e. 
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ofd{sk,sr) = (^(/j(sfc) - /j(sr))2)i/2 ^ (10) 

i=i 

for a problem with q objective functions fj {j = 1,2, - ■ ■ ,q). 

In the case of continuous function optimization the Pareto optimal solutions 
are close to each other. Then, if we want to reach any neighbour Pareto solution 
from a given Pareto solution, we need to move as little as possible. However, 
for discrete domain problems, this continuity property does not hold. Thus, we 
need to know how far the Pareto solutions are from one another. We need also 
to know what type of move operator is needed to go from one Pareto solution 
to another. This is important from the application point of view, since it will 
allow to increase the number of solutions available to the decision maker. Studies 
aimed to address this issue are exposed in the next section. 

6 Experimental Setup and Results 

This section is devoted to the study of the genetic operators: mutation (move), 
and crossover. We specially emphasize on the distance-dominance relations of 
these operators. 

The specific problem we are dealing with is a 49-jobs 15-machines flow-shop 
problem with three objective functions. This problem was proposed in [D|, and 
its solution space size is 6.08x10®^. Experiments and results related to the move 
operators are presented. 
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Fig. 1. Average of ofd for random solu- 
tions. 



Fig. 2. Domain distance distribution. 
Random solutions. 



6.1 Move Operators 

The first experiment is aimed to study relations between the domain distance 
and the objective function distance. To do this, a set of 500 random solutions 
is generated, and for each domain-distance value (generated by comparing all 
against all solutions), the average on the objective function distance pis com- 
puted. The experiment is repeated 100 times. 
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Fig. 3. Average of ofd for SWAPl, SWAP2, and SWAPS. The operators are applied 
to random solutions. 

Table 1. Dominance relations. The operators are applied to random solutions. 



Operator (o ^ n) 


(o = n) {o ^ n) 


(o > — < n) 


Swapl 


42.46 


6.04 


42.56 


8.94 


Swap2 


34.10 


0.43 


34.08 


31.39 


Swap3 


33.55 


0.41 


33.65 


32.39 



Figure 1 shows the results of this experiment. All averages for all domain- 
distance values tend to the same constant objective function value. This tells 
us that close/far random solutions, in average, produce similar distances in the 
objective function space. The noisy behaviour in both extremes of the curve 
is due to the small number of individuals that are present for these values of 
domain distance, as it is shown in Figure 2. 



Table 2. Dominance relations. The operators are applied to non-dominated solutions 
only. 



Operator (o ^ n) 


(o = n) 


(o ^ n) 


(o > — < n) 


Swapl 


52.23 


4.59 


36.11 


10.83 


Swap2 


50.18 


0.33 


17.24 


32.35 


Swap3 


44.70 


0.36 


19.29 


35.65 



This result does not give much information on how close/far solutions in the 
domain space are mapped in the objective function space. One would, at first 
glance, expect that close/far solutions in the domain space produce close/far 
mappings in the objective function space. But, this is not the case for randomly 
generated solutions. 
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Table 3. Dominance relations. The operator are applied to the non-dominated solu- 
tions obtained after one GA run. 



Operator (o ^ n) 


(o = n) 


(o ^ n) (o > — < n) 


Swapl 


95.74 


0.28 


0.40 


3.58 


Swap2 


98.57 


0.01 


0.02 


1.40 


Swap3 


97.26 


0.01 


0.05 


2.68 



The objective of the second experiment is to study how move operators in 
the domain space move in the objective function space. A set of 2000 random 
solutions is generated, and the move operators are applied to each solution. 
The distance between the original solution o (origin) and the new solution n 
(neighbour) is measured along with their objective function distances. 

Figure 3 shows the results for each move operator defined in section 4. We see 
that, as expected, the move operator SWAPl (a single step in the domain space) 
produces neighbours which are close to each other in the objective function space. 
The point to learn is that if we need to go few steps in the objective function 
space we can use SWAPl, or to choose those solutions generated by SWAP2 or 
SWAPS which are close to their origins in the domain space. 

Now, we just need to know about the non-dominance relations generated 
by these move operators. To study these relations we propose the following 
experiment. Again, a set of 2000 random solutions is generated. Each solution 
is modified with each of the three move operators, then the dominance relation 
between the original and the modified solution is counted. The experiment is 
repeated 100 times and the mean is computed. 

If the origin o dominates the neighbour n, then (o n) is used. If o and 
n produce the same objective function values, then (o = n) is used. Neighbour 
dominance of the origin and non-dominance of neither the origin nor the neigh- 
bour are expressed by (o -< n) and (o > — < n), respectively. 

Table 1 shows the results for random solutions. We see that the three oper- 
ators behave similarly except for the number of solutions where no dominance 
relation can be determined. Swap2 and Swap3 produce higher values than Swapl. 

Table 2 shows the results when only non-dominated solutions are selected 
(from the set of random solutions) as origin points. Here we observe that Swap3 
produces more promising results than the other operators. This is because Swap3 
accounts for 54.95% for cases where (o ^ n) and (o n), while the others do 
not reach 50%. 
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Fig. 4. Distance and dominance relations. 
The OBX operator is applied to random 
solutions. 



Fig. 5. Distance and dominance rela- 
tions. The PPX operator is applied to 
random solutions. 



Table 3 presents the results when the move operators are applied to non- 
dominated individuals obtained after a 1000-generations-run of a GA. The results 
are the average over 100 runs. These results show how good or bad the used 
GA is. If we can easily find any dominating neighbour after a GA run, then 
it means that the algorithm performs poorly. However, if it is difficult to find 
new dominating solutions then it means that our algorithm performs well (i.e. 
converges to the Pareto-optimal set). Table 3 shows that it is difficult to find a 
dominating solution by using Swap2 or Swap3. 

The analysis of move operators presented here gives us the idea of exploiting 
what can be called the “non-dominated local search” procedure. Here, move 
operators as well as move decisions can be studied to see their influence in the 
quality of the final set of non-dominated solutions. 



6.2 Crossover Operators 

Grossover operators are in charge of information interchange among individuals. 
Therefore, it is important to know which individuals are to be chosen, and how 
the information should be interchanged among these chosen individuals. 

As a first step in the study of crossover operators we analyze the relations 
between the domain distance of the parents and the dominance relations between 
the parents and the offsprings. For doing this we use a set of randomly generated 
solutions, non-dominated solutions from the set of random solutions, and a set 
of non-dominated solutions of the last generation after a 1000-generations-run 
of a GA. In all cases the experiment is repeated 100 times and the average is 
computed. 

Figures 4 to 6 show the relations of dominance against the parental distance 
when the parents come from the set of random solutions. The three operators, 
i.e. OBX, PPX, and 2PX have similar characteristics. For all distance values, 
cases where non-dominance relation can be establish between the offspring and 
at least one of the parents, are always greater than the other cases. 
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Domain distance (dn ) 

Fig. 6. Distance and dominance relations. 
The 2PX operator is applied to random 
solutions. 




Domain distance (dn) 



Fig. 7. Distance and dominance rela- 
tions. The OBX operator is applied to 
non-dominated solutions only. 



Table 4 shows the overall average results for each operator. We see that all 
operators have very similar averages. 

Figures 7 to 9 show the results when the parents come from non-dominated 
solutions in the set of random solutions. 





0.8 












s 






gO.6 




p >c 




p<c 


p < c 




p > c . ■ — 


pXc 


0.2 


pXc 


p=c 


0 


p=c 



0.44 0.46 0.4S 0.5 0.52 0.54 0.56 

Domain dhstance (dn) 



0.44 0.46 0.48 0.5 0.52 0.54 0.56 

Domain distance (dn) 



Fig. 8. Distance and dominance rela- Fig. 9. Distance and dominance rela- 
tions. The PPX operator is applied to non- tions. The 2PX operator is applied to 
dominated solutions only. non-dominated solutions only. 



Table 5 corresponds to the average results regardless the distance between 
the parents. It is observed from this table and Figure 9 that the superior char- 
acteristic of 2PX crossover over the other two types is clearly appreciated. The 
number of cases where the child dominates at least one parent is larger over all 
domain distance values. This could be the reason to explain why Isibuchi and 
Murata jOj found that this operator was adequate when dealing with the MO 
flow-shop problem. 

Figures 10 to 12 and Table 6 present the results when the parents come from 
non-dominated solutions after one GA run. 
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Domain distance (dn ) 



Fig. 10. Distance and dominance relations. The OBX operator is applied to non- 
dominated solutions obtained after one GA run. 




Domain distance (dn ) 



Fig. 11. Distance and dominance rela- 
tions. The PPX operator is applied to non- 
dominated solutions obtained after one 
GA run. 




Fig. 12. Distance and dominance rela- 
tions. The 2PX operator is applied to 
non-dominated solutions obtained after 
one GA run. 



We can see that for these experiments the number of cases where at least one 
parent dominates the offspring increases with increasing values of the domain 
distance. Again, the 2PX operator seems to outperform the others as it is also 
shown in Figure 12 and Table 6. 

6.3 Comparative Results 

Based on the experiments outcome in the previous subsections, we select the 
appropriate operators to use in Algorithm 1, and compare our results with those 
presented in 0. 

It is worth mentioning at this point that the operators used in jS] are dif- 
ferent from those used here. In this reference one-point crossover, SWAPl, and 
stochastic remainder selection with respect to the dummy fitness are the used 
crossover, mutation and selection operators, respectively. 
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Fig. 13. Tardiness-Makespan relations. Non-dominated solutions in the last generation. 
Table 4. Dominance relations. The operators are applied to random solutions. 



Crossover (p >~ c) {p = c) {p -< c) {p > — < c) 


OBX 


29.23 0 


29.25 


41.53 


PPX 


28.31 0 


28.81 


42.88 


2PX 


28.65 1.31 


28.84 


41.19 
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Fig. 14. Mean Flow Time-Makespan rela- 
tions. Non-dominated solutions in the last 
generation. 



Fig. 15. Tardiness-Mean Flow Time rela- 
tions. Non-dominated solutions in the last 
generation. 



Swap3 and 2PX are selected as the genetic operators. The population size is 
set as g = 100 individuals. The maximum number of generations is Tmax = 1000. 
The crossover and mutation rates are Pc = 1.0 and Pm = 0.01, respectively. 

Table 7 shows comparative results for the best (Swap3-2PX) and the worst 
(Swapl-PPX) combinations of operators. The number of non-dominated solu- 
tions, the average domain distance and objective value distance are shown. As 
expected the combination Swap3-2PX gives better results than those given by 
Swapl-PPX. 
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Table 5. Dominance relations. The operators are applied to non-dominated solutions 
only. 



Crossover (p ^ c) 


II 

A 


(p c) 


OBX 


42.52 


0 


29.82 


27.66 


PPX 


42.42 


0 


33.26 


24.32 


2PX 


29.00 


2.04 


53.05 


15.91 



Table 6. Dominance relations. The operators are applied to non-dominated solutions 
obtained after one GA run. 



Crossover (p c) {p = c) {p < c) (p > — < c) 



OBX 


93.59 


0.86 


0.46 


5.09 


PPX 


89.77 


0.67 


0.60 


8.96 


2PX 


79.45 


4.11 


0.83 


15.61 



Table 7. Comparison of non-dominated solutions (NDS) for the best and the worst 
combination of operators. 



Crossover % of NDS dn ofd 

Swap3-2PX (best) 58.42 0.048 47.75 

Swapl-PPX (worst) 26.16 0.016 27.36 



Finally we compare our results with those reported in 0. The projection of 
the solutions are shown in Figures 13 to 15. This are solutions that belong to 
the non-dominated front of the last generation of a single GA run. We can see 
that our results (MOGAl) clearly outperforms those of Bagchi (MOGA2) j0|. 

7 Conclusions 

A detailed analysis of mutation and crossover operators is presented for a multi- 
objective flow-shop problem. The analysis is focused on how the operators in- 
fluence the generation of non-dominated solutions according to the parental dis- 
tance. 

Based on this analysis we are able to design a high performance GA and 
applied it to a problem presented by Bagchi in jO] . The simulation results show 
that our results clearly outperform (in solution quality) the ones presented in 
0. The relevance of this work is in the procedure proposed for choosing the 
right operators to use and not much in the superiority of our results over those 
presented in 0. The analysis of mutation operators also gives some insight on 
how to perform effective moves when a “non-dominated local search” is to be 
designed. 
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There are still many open questions related to the landscape of multi- 
objective combinatorial optimization problems, specifically for multi-objective 
scheduling problems. Our results present just a little but motivating step in 
answering the open questions. 
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Abstract. This paper adapts metaheuristic methods to develop Pareto optimal 
solutions to multi-criteria production scheduling problems. Approach is 
inspired hy enhanced versions of genetic algorithms. Method first extends the 
Nondominated Sorting Genetic Algorithm (NSGA), a method recently proposed 
to produce Pareto-optimal solutions to numerical multi-objective problems. 
Multi-criteria flowshop scheduling is addressed next. Multi-criteria job shop 
scheduling is subsequently examined. Lastly the multi-criteria open shop 
problem is solved. Final solutions to each are Pareto optimal. The paper 
concludes with a statistical comparison of the performance of the basic NSGA 
to NSGA augmented by elitist selection. 



1 Multi-criteria Shop Scheduling 

Managing a production rarely implies “getting the orders out the fastest way 
possible.” Scheduling is an optimization process by which limited resources are 
allocated over time among parallel and sequential activities, has been now formally 
studied over five decades. The task can quickly become complex, limiting the 
practical utility of combinatorial, mathematical programming and other analytical 
methods (Baker, 1974; French, 1982; Sannomiya and lima, 1996). Nevertheless, 
typically, a production manager is looking for ways to simultaneously minimize 
tardiness of the jobs from committed shipping dates, maximize the use of expensive 
presses, furnaces, reactors and rolling mills, and human resources, minimize the mean 
flow time of jobs, etc. etc. Such scheduling situations are multi-criteria or multi- 
objective (Table 1). Ironically, most formal scheduling techniques confine to single 
objective optimization (Pinedo, 1995). In multi-objective scheduling, the objectives 
are often conflicting. Such problems are called multiple criteria decision making 
(MCDM) problems. 

MCDM assumes that all the objectives are dispensable and that all can be traded 
off although some may be more important than the rest. Adulbhan and Tabucanon 
(1980) classified these techniques based on the way the initial multi-objective 
problem is transformed into a mathematically manageable format using conversion of 
secondary objectives into constraints, development of a single combined objective 
function, or treatment of all objectives as constraints. Others have emphasized the 
stage at which the analyst needs information from the decision-maker. Recently, 
heuristic methods, which engage the Pareto-optimality concept to solve multi- 
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objective problems have been proposed. In multi-objective optimization there usually 
exist many solutions that are optimal in the Pareto sense, a concept existing in 
economics. From this viewpoint, due to the plurality of optimal decisions, the most 
desirable decision may be selected after one has generated the nondominated 
solutions. The final solution thus selected is called the preferred solution. 

One approach proposed to search for Pareto optimal solutions is the Nondominated 
Sorting Genetic Algorithm (NSGA), created by Srinivas and Deb (1995). NSGA is 
based on concept of niche formation and speciation seen in natural biological 
evolution and is actually a clever extension of Simple Genetic Algorithm (SGA), the 
original method created by Holland (1975) to optimize a single objective. This paper 
accepts Pareto optimality as the basis for rational choice in multi-objective decision 
making investigates the efficacy of variations of NSGA to solve multi-objective shop 
scheduling problems. This needs no prior specification of the decision maker’s 
preferences, which may be easier to express once the nondominant solutions are at 
hand. Indeed, the use of weighted-sum, if preferred, is a much easier task: it requires 
only a direct application of SGA. 

In multi-objective optimization with conflicting objectives, there is no unique 
optimal solution. A simple optimal solution may exist here only when the objectives 
are non-conflicting. For conflicting objectives one may at best obtain what 
economists call “efficient” or nondominant solutions (Figure 1). An efficient solution 
x* (also called a Pareto optimal solution) is one in which no increase can be obtained 
in any of the objectives without causing a simultaneous decrease in at least one of the 
remaining objectives (Keeney, 1983). However, although the concept of Pareto 
optimality is now over 90 years old (Pareto, 1906), methods for finding nondominated 
solutions are relatively few. When the factors and the constraints are well-behaved, a 
procedure known as the 8-method (Seo and Sakawa, 1988) may find the 
nondominated solutions. For other problems, search methods are used. Shop 
scheduling decisions involve sequences (of jobs, machines, etc.), hence the constraints 
and objectives of relevance here are typically not well-behaved. 



Table 1. Multiple management objectives in an enterprise 



Department 


Objective(s) 


Budget 


Cost minimization 


Production 


• Production output maximization 

• Production time minimization 

• Resource utilization 


Quality 


• Product quality maximization 


Control 


• Rework minimization 


Personnel 


Minimization of hiring and firing 


Marketing 


Uninterrupted supply of products to customers 
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Fig. 1. Nondominated multi-objective maximizing solutions on the efficient front 



2 Machine Scheduling Preliminaries 

A flowshop requires unidirectional flow of work with a variety of jobs all being 
processed sequentially in the same order, in a one-pass manner. Each job follows 
identical routing through the processing stages or “machines.” Unlimited storage 
exists between the machines. The challenge here is to determine the optimum 
sequence in which the jobs should be processed in order that one or more performance 
measure, such as the total time to complete all the jobs, the average mean flow time, 
or the mean tardiness of jobs from their committed dates is minimized. A job shop, 
on the other hand, involves processing of the jobs on several machines without any 
“series” routing structure (Uckun et al, 1993). Open shops are similar to job shops 
with the exception that there is no a priori order of the operations to be done within a 
single job (Pinedo, 1995). Many shop scheduling problems belong to the NP-hard 
class (Lenstra et al., 1977, French, 1982). 

Recent advances in metaheuristic search methods that help conduct directed 
“intelligent” search of the solution space have brought new possibilities to rapidly 
find efficient and economic schedules, even if not optimal. These methods are 
context-independent and can be applied even when very little in known about the 
(mathematical) structure of the response functions. These methods are stochastic and 
now include genetic algorithms (GAs), tabu search, threshold acceptance and 
simulated annealing. 

The genesis of GAs was an insightful observation by John Holland (1975) that 
some aspects of natural evolution, in particular adaptation, could be cast into useful 
algorithms to seek out solutions to the more difficult global optimization problems. 
GAs discover solutions to global optimization problems adaptively, looking for small, 
local improvements rather than big jumps in solution quality (Goldberg, 1989). While 
most stochastic search methods operate on a single solution to the problem at hand, 
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GAs operate on a population of solutions. To use GA, however, one must first 
encode the solutions to the problem in a chromosome-like structure (Goldberg, 1989). 
The procedure then applies crossover and mutation and other processes inspired by 
natural evolution to the individuals in the population to generate new individuals 
(solutions). The GA uses various selection criteria to pick the best individuals for 
mating so as to produce superior solutions by combining parts of parent solutions akin 
to genetically breeding race horses or superior strains of food crop. The objective 
function of the problem being solved determines how “good” each individual is. 

Two phenomena closely linked to evolution are known as niche formation and 
speciation (Smith, 1995). Niches are behavior patterns that develop when organisms 
compete with each other for limited resources, or they attempt to survive in 
unfavorable environmental conditions. Speciation is the process by which new and 
stable species evolve in natures. NSGA uses the notions of nondomination (Figure 1) 
and niche formation. However, the performance of NSGA is significantly enhanced if 
one incorporates elitism in it. This speeds up the rate at which Pareto-optimal 
solutions are discovered. The elitist enhancement of NSGA, called ENGA in this 
paper, incorporates an additional nondominated sorting step to decide which solutions 
would form the parents. 

While the methods presented in this paper solve only “static” flowshop, job shop 
and open shop problems, solutions to multi-criteria scheduling problems even for the 
static case are not easy to develop. However, the method can be extended to a 
dynamic flow, job or open shop. Solving the multi-criteria static open shop may be 
extended to multi-criteria classroom scheduling, a problem as common as universities 
and colleges that abound. 



3 Genetic Algorithms for Sequencing Jobs in a Flowshop 

Genetic algorithms (GAs) belong to the class of heuristic optimization techniques that 
utilize randomization as well as directed smart search to seek the global optima. 
Increasingly, GAs are being found to be more general and abstract (context 
independent) than other popular heuristic techniques presently available. As a result, 
many researchers have already turned to GAs to solve the more difficult and large 
sequencing, lot sizing and even classroom scheduling problems (Carter, 1997). 
Pinedo (1995) provides a clear rendering of GAs in the context of scheduling. The 
description goes briefly as follows and this is the broad approach followed by many 
researchers. 

When applied to flowshop scheduling, GAs can view job sequences directly as 
“chromosomes” (the candidate schedules or solutions), which then constitute the 
members of a GA population. Subsequently, each individual (a schedule) is 
characterized (merited) by its fitness (e.g. by its makespan value). For a flowshop a 
chromosome would represent a job sequence on a machine, such as [1 3 2 4 5]. 
Fitness evaluation for a sequence would go, for instance, as the smaller its makespan, 
the “fitter” it is. As the GA executes, in each generation the “fittest” chromosomes are 
encouraged to reproduce while the least fit "die." A mutation in a parent chromosome 
may be an adjacent pairwise interchange of jobs or some variation of it in the 
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corresponding sequence. Mutation here is designed to perform random exchange of 
jobs ("genes") between two randomly selected positions on a target chromosome. An 
example is shown below. 



parent 



child 



Step 1 : Select two genes randomly 




Step 2: Exchange their positions 



A crossover may combine some features of two parent chromosomes to create 
progenies inheriting some characteristics from each parent. A “repair” scheme may 
be set up to ensure that only feasible progeny sequences are produced. Thus the child 
[1 3 2 2 5] would require some “repair” (replacement of the repeated job (2) by 
the missing job (4)). An example of position-based crossover that shows the transfer 
of jobs from two “parents” to form a feasible “child” would be 



parent 1 



child 



parent2 




4 NSGA Solutions to the Multi-objective Flowshop Problem 

Work on the multi-objective flowshop problem has begun relatively recently. Murata, 
Ishibuchi and Tanaka (1996) provide a GA-based approach to produce a tentative set 
of Pareto optimal solutions to the problem combining the minimization of makespan, 
the minimization of total tardiness in a flowshop, and the minimization of total flow 
time. The method minimizes a weighted sum of multiple objectives { /j(x) } given by 



/(x) = Wi/l(x) H- tV2/2(x) + ...+W„ /n(x) 
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Fitness is calculated from the weighted sum objective function. The weights {w,} 
in this weighted sum are not the same for each solution; rather they are randomly 
varied to realize probing of various search directions. The investigators qualitatively 
compare their solutions to those obtained using VEGA (see Srinivas and Deb, 1995). 
Tamaki et al (1995) perform multi-criteria optimization of hot rolling by decomposing 
the problem first into mathematical programming subproblems. GA combined with 
local search is used to develop solutions. Thus, GA seems to have been exercised well 
in the past five years to tackle the single-objective flowshop. These results appear to 
be comparable to the best-performing heuristics available. Multi-criteria flowshops, 
however, are sequenced till now as weighted-average objective problems. 

The kernel of NSGA (Srinivas and Deb, 1995), the subject of the present work, is 
the ranking selection method it uses to emphasize the Pareto optimal regions where 
the desirable solutions reside. NSGA also uses a niche forming procedure to maintain 
a stable population of good solutions (thus avoiding genetic drift). Thus NSGA 
differs from SGA in the manner the selection operator works. The crossover and 
mutation operators in NSGA work as they do in SGA, so the representation (solution 
coding) schemes can be identical in SGA and NSGA. However, before selection is 
performed in NSGA, the population is ranked on the basis of the nondominated 
sorting concept (Figure 2), to emphasize Pareto-optimality. An illustration follows. 

Suppose we wish to minimize makespan, mean flow time as well as the mean 
tardiness of jobs in a flowshop simultaneously. In this illustration the solutions 
(permutation representation of job sequences) are coded by the procedure described in 
Section 2 above. Crossover is one-point while mutation is adjacent job-interchange. 

Reproduction would give preference to nondominated members identified by 
nondomination ranking of all solutions in the population. Sharing to encourage niche 
formation would be phenotypic, derating the dummy fitness values of the solutions by 
dividing it by niche count (see Srinivas and Deb, 1995). NSGA is initiated with a 
randomly picked collection (population) of permutations involving n jobs. Table 2 
displays partial list of the processing times and the due dates for a typical flowshop 
problem in which 49 jobs are to be optimally sequenced in a 15-machine flowshop. 
49 ! different solutions are possible here from which the nondominating solutions must 
be separated. The three objectives to be simultaneously minimized are makespan, 
mean flow time and mean tardiness of jobs. 

The NSGA may be optimally pameterized using a design-of-experiments (DOE) 
approach (Bagchi and Deb, 1996) employing pilot GA runs. This methodology is 
new in the domain of GA parameterization and is therefore briefly recalled here. A 
critical difficulty in applying GAs is that the various parameters must be correctly 
chosen to ensure the GA’s satisfactory on-line and off-line performance. Using 
NSGA satisfactorily would be no exception. Further, the answers found here (the 
optimum values for the GA parameters ps, pc, Pm, etc.) are often problem-dependent. 
Also, one notes that crossover and mutation effects can interact and “support each 
other in important ways” and observes that a judicious blend of mutation and 
crossover does better than either one alone to strike a good balance between 
exploration of the total solution space and exploitation of good solutions currently at 
hand. It is easy to see, therefore, that the optimization of different GA parameters 
itself is a global search problem and one that must be tackled in the problem domain 
of interest, before we apply the GA in a “production run”. 
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Table 2. Partial list of processing times and due dates for a 15 m/c-49 job flowshop 





m/c 
# 1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


Due 

Date 


Jobl 


74 


72 


54 


57 


52 


60 


4 


8 


40 


8 


85 


45 


74 


67 


48 


80 


2 


99 


77 


58 


50 


31 


67 


19 


96 


93 


29 


27 


6 


85 


22 


48 


160 


3 


15 


10 


85 


2 


92 


53 


60 


63 


11 


94 


44 


71 


19 


99 


94 


240 




































49 


0 


77 


72 


40 


0 


64 


38 


34 


76 


79 


39 


1 


64 


23 


17 


3920 



In multi-objective optimization, which here is the domain of NSGA, we would be 
interested in a good set of Pareto-optimal solutions. Good Pareto optimal solutions 
have two properties: (1) They lie on the Pareto-optimal front, and (2) They are well- 
dispersed on the front (the solutions should not form “clumps”). 

To parameterize NSGA, the concept of nondominance is used to select the 
response factor, as follows. All solutions obtained in a fixed number of GA iterations 
by conducting a set of DOE experiments are pooled together and then subjected to 
non-dominated sorting. The count of distinct Front 1 members contributed by each 
experiment in the DOE matrix was used as the response. Dispersion was evaluated 
subjectively. 

The effectiveness of NSGA is observed to be affected by the mutation and 
crossover probabilities p^ and p„ the population size ps, and the NSGA parameter 
Oshare^ which controls the precision with which the Pareto optimal front is combed for 
the existence of a possible optimal point or peak. For this illustration we use again 
the 15 machine-49 jobs flowshop problem, the objectives being minimization of 
makespan, minimization of mean flow time, and minimization of mean tardiness. 

Figure 3 displays the factor effects and the interaction between the parameters, 
indicating the relatively strong impact on convergence of solutions to the Pareto front 
of population size (pj, probability of mutation (p„,) and interaction between p^ and p„j. 
The results also indicated that a high population size (pj, low Oshare (= ds), low 
probability of mutation and a high probability of crossover would be the best 
parameter combination for NSGA. 

Subsequently, in implementation of NSGA these parameters would be set at values 
Ps = 200, ds (= Oshare) = 1, Pc = 0.9 and p„i = 0.0. We note that DOE-based 
parameterization appears to work considerably better than the use of ad hoc values 
arbitrary “rules-of-thumb” parameterization guidelines culled from GA literature. We 
also note that the DOE method is exploratory, giving the analyst freedom to make 
parameterization problem-specific. 



5 How NSGA Produced Pareto Optimal Job Sequences 

Table 3 presents a partial display of the results of running NSGA for 250 generations 
with 37 different initial random populations. As Table 3 shows, in this instance 
about 22% of the population resulting at the end of 250 generations were 
nondominated or Pareto optimal. The actual number of Pareto optimal solutions 
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discovered by NSGA is a function of the Pareto optimal landscape (the number of 
Pareto optimal solutions that exist for a problem), and population size (ps). 
Experimental evidence indicates that the larger is the number of jobs being 
sequenced, the higher should be the value of ps to facilitate the finding of as many 
Pareto optimal solutions as possible for a given number of NSGA generations 
executed. We record here that the average time to execute NSGA for the 49-job 15- 
machine problem on a HP 9000/850 system running C-H- was 34 seconds. We 
recount briefly how NSGA achieved this and then mention how the quality of the 
solutions may be further improved. The two key strategies employed by NSGA are 
(1) the use of nondominated ranking to seek individuals with the best Pareto ranks, 
and (2) the use of fitness sharing to obtain as many nondominated solutions as 
possible. 



i 




Fig. 2. The Nondominated Sorting GA 
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Fig. 3 Factor effects on the discovery of Pareto optimal 
solutions by NSGA 
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Table 3 Partial list of the count of Pareto optimal solutions found by NSGA 
for the 15 m/c-49 Job flowshop 



Generations 


0 


25 


50 


75 


100 


125 


150 


175 


200 


225 


250 


Seed # 1 


7 


7 


16 


15 


25 


36 


29 


26 


27 


27 


27 


Seed # 2 


2 


8 


16 


10 


15 


26 


25 


23 


26 


23 


23 


Seed # 3 


3 


11 


3 


18 


20 


21 


11 


16 


12 


11 


12 


























Seed # 37 


4 


12 


20 


17 


32 


32 


28 


26 


27 


30 


33 


Average Count 


6.2 


10.9 


11.7 


15.4 


17.7 


22.4 


23.7 


22.8 


22.3 


22.4 


22.2 



For each NSGA iteration, an initial set of solutions (randomly generated by a seed) 
started the process. The nondominated individuals among these solutions were first 
identified from the initial population of solution. These initial nondominated 
individuals constitute the first nondominated front in the population and assigned a 
dummy fitness value proportional to the population size (dummy fitness = pj. This 
dummy fitness is intended to give an equal reproductive chance to all these initial 
nondominated individuals. Diversity among the solutions was maintained by 
inducing niche formation by sharing fitness among neighboring individuals. Next, 
the individuals in the first front were ignored temporarily and the rest of the 
population was attended to, as follows. Individuals on the second nondominated front 
were identified. These second front solutions were then assigned a new and equal 
fitness value kept smaller than the shared dummy fitness value of the solutions on the 
first front. This was done to differentiate between the members of the first front and 
the members of the second front. Then, sharing was again done within the second 
front, and the process went on till whole population had been evaluated, and classified 
into successive fronts. This process led to the creation of several successive fronts of 
“nondominated” individuals. 
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Next, individuals in the whole population were reproduced according to their 
relative (shared dummy) fitness value. This approach facilitates the search for the 
nondominated regions of the Pareto-optimal fronts. This results in the quick 
convergence of the population towards the nondominated region while sharing helps 
to distribute the individuals over the entire nondominated region. 

The efficiency of NSGA in seeking Pareto optimal solutions lies in the manner it 
reduces multiple objectives to be optimized to a single dummy fitness function using 
the nondominated sorting procedure. If a solution is locally dominated, it is globally 
dominated. However the converse is not true (Srinivas and Deb, 1995). In each 
generation NSGA finds locally nondominated points only. However, if there exists 
any globally nondominated member, NSGA increases the likelihood its survival. This 
is because in the NSGA scheme, any such individual will have the highest possible 
dummy fitness. Thus the power of NSGA lies in the successful transformation of a 
multi-objective problem, no matter how many objectives are there, to a single 
function problem, without losing the perspective of vector optimization. By selecting 
nondominated points, NSGA actually processes the schemata that represent Pareto- 
optimal regions. Therefore, the building blocks for NSGA will be those schemata that 
represent characteristics of globally nondominated individuals. 

NSGA implemented here did not incorporate mating restriction, a phenomenon 
that encourages speciation (formation of species among solutions) in nature. Deb and 
Goldberg (1989) have shown that speciation improves the discovery of multiple 
peaks. Yet another method based on elitism may improve the rate at which the Pareto 
optimal solutions are found. This method is outlined in the next section. Also in it 
this new method is statistically compared with NSGA using an assortment of tri- 
objective flowshop problems as test beds. 

6 An Elitist Multi-objective GA for Sequencing the Flowshop 

This section describes a different genetic algorithm that obtains Pareto optimal 
solutions faster than NSGA. When NSGA is used, it is often seen that it lacks 
somewhat in both on-line performance (rapid convergence to good solutions) and off- 
line performance (superior quality of the final solutions). One key reason for this is 
that NSGA does not preserve the good solutions found from one generation to the 
next generation. Thus, good (near-optimal) solutions lost in one generation have only 
a probabilistic chance in NSGA to reappear in the future. Also, the number of final 
solutions on the Pareto front in NSGA often remains relatively low even with good 
choice of parameters and even after many generations, unless large population sizes 
(> 250) are used. To deal with this deficiency, an enhancement may be devised for 
NSGA that effects a significant performance improvement. Such a GA would be 
elitist in that it would consciously preserve a controlled fraction of the best structures 
or solutions present in a generation to the next generation. The value of preserving 
the elite (high fitness solutions) in GA is well-recognized. 

Citing studies done in 1975 Goldberg (1989) notes that an elitist plan significantly 
improves both on-line and off-line performance of GAs on unimodal surfaces. 
Goldberg also notes that elitism improves local search albeit at some expense of 
global perspective. The enhanced version of NSGA that we describe here is called 
ENGA (the Elitist Nondominated Sorting GA, Figure 4). Unlike NSGA, ENGA is 
designed so as not to mercilessly discard the old population and replace it completely 
by the progeny. Like NSGA, ENGA first produces the progeny through crossover 
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and mutation (equal in number to parents). But it uses a different selection procedure. 
It first ranks the candidate constituents of the next generation by performing an 
additional nondominated sorting of the combined parents + progeny pool. A 
controlled fraction (the top 50%) of the individuals in this combined pool is then 
selected to form the next generation, to mate and propagate their nondominating 
schema characteristics. Thus each generation may contain several members of the 
parent chromosomes without modification if these parents are good enough to be able 
to outrank (in the nondomination sense) some of the newly-created progeny. Beyond 
this, in subsequent iterations it improves the solutions by the combined effects of 
recombination, mutation and fitness sharing. 

In order to evaluate their relative performance, both NSGA and ENGA were coded 
in the C-H- language running on HP 9000/850 and then tested on a number of different 
flowshop problems. The overall quality of solutions produced by ENGA was 
indistinguishable from those produced by NSGA, i.e. both ENGA and NSGA 
produced solutions with comparable degree of convergence to the Pareto front — ^wlth 
identical computational effort employed. But, for majority problems tested ENGA 
produced many more Pareto solutions for the same effort. 

The runs hinted that ENGA might be populating the Pareto front faster. To verify 
this the Wilcoxon signed-rank statistical test was applied to the results obtained. Of 21 
randomly selected flowshop problems tested using identical GA population size, in 19 
problems ENGA produced numerically larger number of Pareto optimal schedules; in 
two NSGA produced more Pareto optimal solutions. Statistically speaking, therefore, 
ENGA outpaced NSGA in the discovery of Pareto optimal solutions. Eigures 5, 6 and 
7 display typical Pareto optimal and the dominated solutions found by 100 iterations 
of ENGA for a 49-job 15 m/c flowshop, a 10-job 5 m/c job shop and a 10-job 10 m/c 
open shop (Jayaram, 1997). No other method known produces such solutions. 

7 Conclusions 

This paper has demonstrated how Pareto optimal shop schedules may be developed. 
In particular it has shown that nondominated sorting augmented by elitism (modeled 
here by ENGA) statistically improves the speed of search to seek out multiple Pareto 
optimal solutions. The method applies to the optimization of arbitrary number of 
conflicting objectives. In this work the GAs were all optimally parameterized using 
the design of experiments procedure, also done for the first time for GAs. This 
approach proved to be highly productive. This paper used three-objective flowshops 
(Figure 5), three-objective job shops (Figure 6) and two-objective open shop problems 
(Figure 7) to demonstrate the techniques. We note here that open shop multi- 
objective problems thus solved can be easily extended to multi-criteria school 
timetabling. Computationally, problems involving up to 15 machines and 50 jobs 
were solved in the present work in one minute or less on systems comparable to an 
HP 9000/850 machine using C-H- coding. More extensive tests of the procedure using 
larger problems are suggested. 
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Fig. 4 The elitist nondominated sorting GA 
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FIGURE 5 3-OBJECTIVE 49-JOB 15 m/cFLOWSHOP 
PAREO OPTIMAL SOLUTIONS 




FIGURES 3-OBJECTIVE JOB SHOP 
A(1), A(2) & A(3) ARE PARETO OPTIMAL 
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FIGURE? 2-OBJECTIVE OPEN SHOP 
A(1),A(2) & A(3) ARE PARETO OPTIMAL 
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Abstract. Many multiple objective genetic algorithms have been developed to 
approximate the efficient frontier of solutions for multiple objective 
optimization problems. However, only a limited number of comparison studies 
have been performed on practical problems. One of the reasons for this may be 
the lack of commonly accepted measures to compare the solution quality of sets 
of approximately optimal solutions. In this paper, we perform an extensive set 
of experiments to quantitatively compare the solutions of two competing 
algorithms for a bi-criteria parallel machine-scheduling problem. 



1 Introduction 

Many real world problems naturally involve multiple objectives. For most multiple 
objective optimization problems, it is much more difficult to find optimal solution(s) 
than for the single objective counterparts. Often, even the single objective version of 
the problem may be NP-hard, which ensures that the multiple objective optimization 
problem is also NP-hard. There has been a significant amount of research on 
developing heuristics and metaheuristics to find approximate solution(s). Among 
these heuristic approaches are genetic algorithms, simulated annealing, etc. 

Due to the ability to find multiple solutions simultaneously in each run, 
multiobjective genetic algorithms are believed to be well suited as a posteriori 
solution approaches for multiple objective optimization problems [8, 14]. The 
advantage of a posteriori solution approaches is that they do not require information 
on the decision maker’s preferences in advance, which is usually very difficult, if not 
impossible, to obtain. The goal of a posteriori multiple objective genetic algorithms is 
to approximate the efficient frontier to provide a set of solutions that represent the 
trade-offs among objectives to the decision maker. The assumption is that the decision 
maker can evaluate the set of alternative solutions and eventually can select the most 
preferred solution amongst them. However, one can also find multiple objective 
genetic algorithms following an a priori approach in the literature [13]. 

Since the Vector Evaluated Genetic Algorithm (VEGA) approach was developed 
[12], many multiple objective genetic algorithms have been presented. Coello [3] 
provides a thorough review of such approaches. However, there exist only a few 
studies that compare the performance of these algorithms and their effectiveness in 
solving real-world problems [16]. One of the reasons for this is that there is no 
consensus about the desired attributes of an approximation (i.e., set of non-dominated 
solutions generated by a heuristic) among researchers; furthermore, there is no 
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commonly used measure to evaluate the solution quality of an approximation. In fact, 
researchers have used several different measures to verify the superiority of their 
algorithms. 

In addition, extensive experimentation is needed to compare competing algorithms 
thoroughly, since the performance of algorithms, (especially their solution quality), is 
dependent on parameter settings, (e.g. cross over probability, mutation probability, 
type of elitism, number of elite solutions kept, fitness function type, selection scheme, 
etc.), the stochastic nature of genetic algorithms (i.e. different random number seeds), 
and the set of problem instances tested. 

In this paper, we illustrate the use of a new quantitative measure for comparing 
algorithms for multiple objective combinatorial optimization problems by applying it 
to two different multiple objective genetic algorithms for a parallel machine- 
scheduling problem. This problem has numerous real-life applications and is strongly 
NP-hard for both performance measures of interest. To compare the performance of 
these algorithms, we use a new measure called Integrated Convex Preference (ICP), 
which was introduced in [7] and is described in the next section. 

The paper is organized as follows. In the next section, a brief description of ICP 
measure is provided. The parallel machine scheduling problems of interest and the 
two competing genetic algorithms are followed. And the experimental scheme and 
results and analysis follow. Finally, conclusions and future research are given. 



2 Integrated Convex Preference (ICP) Measure 

The ICP measure recognizes that in a posteriori solution approaches, the decision 
maker eventually chooses the most preferred single solution among a proposed set of 
Pareto-optimal solutions based on his or her value structure. In an a posteriori 
solution approach, the decision maker’ s utility/value function is not considered during 
the solution procedure. However, while we do not have absolute information on the 
value function, we can make weaker assumptions that will facilitate the evaluation of 
the solution quality generated by heuristics. A set of approximate solutions which is 
more likely to provide the best single solution to the decision maker can be 
determined as a better set of solutions than alternative solution sets. 

A variety of value functions - functions that combine K different objective 
function values into a scalar quantity - have been used to compare interactive solution 
techniques in the MCDM literature, including linear additive functions, multiplicative 
models, nonlinear additive functions, multi-linear, quadratic, square root and L4-norm 
functions (see [1] for a set of references to the use of each value function). In many 
cases, the decision maker’s value function can be expressed as a convex combination 
of the objectives (linear or non-linear objectives), which means that there is a weight 
vector that specifies a weight for each objective. In this case, an optimal solution is 
guaranteed to be an efficient extreme solution namely, an extreme point of the convex 
hull of efficient solutions and, for each value of the weight vector, there will be at 
least one efficient extreme solution that will yield the optimal blended objective 
value. 

The following provides a formalization of the approach used to develop the ICP 
measure. Represent the blended, or combined objective function as a parameterized 
family of functions g(x;a), where a given value of the parameter (weight) vector as A 
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represents a specific (scalar-valued) objective function to be minimized. Given a 
finite set, X, of non-inferior solutions, there is at least one optimal solution from this 
set for the objective given by g(x; a) for any fixed value of a. For a given g, define a 
function : A^X that maps parameter values to a corresponding optimal solution. 

Figure 1 (a) shows a set of five non-inferior solutions, two of which (x^ and x^) are 
not supported for a bi-criteria minimization problem. Note that in the bi-criteria case, 

=1-0) for a convex combination of objectives. In Figure 1(a) we show that any 
particular weight on the objective functions (e.g., O'=0.5) leads to at least one 
supported solution (e.g., Xj). Figure 1(b) shows that there is a range of weights for 
which any particular supported solution will he chosen. Any one of these weights 
will give an objective vector in the cone indicated in that figure. In Figure 1(c), we 
show the range of weights for which different supported solutions will be chosen. In 
particular, for the case shown in Figure 1(b), Figure 1(c) shows that point x^ will be 
chosen for weights (a) between ^ and ^ . It is well known that, for any supported 
point, this weight range is a single interval. 




Fig. 1. (a) The optimal solution for a;=0.5 (i.e., x^), (b) Range of weights for which is 
optimal, (c) Optimality ranges for each supported solution 



For a given value function, use a function h: A — >91^ to indicate a density over the 
parameter values such that j^^^h(a)da = l . We are interested in evaluating the 
measure of integrated convex preference (ICP): 

ICP(X)= j h(a)g(xg(a);a)da , (1) 

aeA 

which maps sets of solutions to the real numbers. The function h{d) is used here to 
represent the uncertainty in the decision maker’s evaluation of his/her preference, and 
not to indicate that the weight a is random. Because x^ is piecewise constant, the 

integral in (1) decomposes into the portions of the domain A^ = Xg^ (x) corresponding 

to each element x e Z: 
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ICP{X)= X 



^h{a)g{x\a)da . 



( 2 ) 



Adding new solutions to the set X can only improve (reduce) the value of 
g(x^(a);a) for each value of a, and so ICP(X) is monotonically non-increasing over 
increasing sequences of sets of solutions. 

Details of the ICP calculation procedure can be found in [7]. 



3 A Bi- Criteria Parallel Machine Scheduling Problem 

In this paper, we consider a parallel machine scheduling problem with two objectives 
to be minimized simultaneously: makespan and total weighted tardiness [2]. For the 
specific problem instance we take five identical machines in parallel and 100 jobs 
with release times (r), processing times (p.), due dates (c7), weights (w.), and sequence 
dependent setup times (s.J to be assigned to the machines. Jobs are classified into 4 
different families. A job can only be assigned to a single machine since preemption 
of a job before its completion is not allowed. 

Makespan is equal to the completion time of the last job in the system, which can 
be defined as max {C^C^ ,..., CJ where C. is the completion time of job j. A 
minimum makespan implies a high utilization of the machines. Total weighted 
tardiness is the weighted sum of tardiness of n jobs. Hence, the total weighted 

tardiness (TWT) objective can be defined as ’ where T. = max {0, C - d.}. 

Minimum total weighted tardiness indicates a high customer satisfaction level due to 
the on-time delivery of jobs. The release time of job j is the earliest time at which job 
j can be processed. The processing time of job j is the time that is required to finish 
job j on any machine. In our example, the processing time of job j does not depend on 
the machine chosen. The due date of job j can be considered as the committed 
customer delivery date of job j. The completion of job j after its due date is allowed, 
but a penalty is incurred. The weight of job j represents the importance or priority 
level of job j. If the weight of job j is higher than others, then this implies that job j 
has a higher priority than others. The sequence dependent setup time is the time 
needed on any machine if the next job k on the same machine has a different family 
than the preceding job j. 

Hence, the decisions to be made are the assignment of jobs to machines and 
scheduling these jobs on each individual machine. Minimizing the makespan on each 
single machine with sequence dependent setup time is equivalent to a Traveling 
Salesman Problem (TSP), which is strongly NP-Hard [11]. The single machine TWT 
problem is strongly NP-Hard [11], which makes the parallel machine TWT also 
strongly NP-Hard. Thus the parallel machine-scheduling problem considered in this 
paper is strongly NP-hard since it would be strongly NP-Hard if each objective was 
optimized separately [11]. 
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4 Hybrid Genetic Algorithms (MPGA, MOGA) for Parallel 
Machine Scheduling Problems 

For multiple objective parallel machine scheduling problems such as the one we are 
considering in this paper, several approximation algorithms have been proposed. In 
this paper, we use two genetic algorithms: Multi-Population Genetic Algorithm 
(MPGA) developed by Cochran et al. [2] and Multi-Objective Genetic Algorithm 
(MOGA) suggested by Murata et al. [9]. 

Schaffer [12] proposed the Vector Evaluated Genetic Algorithm (VEGA) method 
to find the efficient frontier for general multiple objective problems. Using this 
method, a population is divided into disjointed sub-populations and a sub-population 
is optimized with respect to each objective. VEGA, by the nature of its disjointing 
approach (vector optimization), tends to form the extreme solutions of the efficient 
frontier since its search direction is unidirectional. 

Murata et al. [9] proposed the Multi-Objective Genetic Algorithm (MOGA), which 
selects individuals for a crossover operation, based on a weighted sum of linear 
objective functions with variable weights that are not constant but randomly specified 
for each generation. That is, the fitness value of solution x in each generation, fix) is 
equal to 

K (3) 

fix) = ^aifiix) , 
i=l 

where K is the number of objectives, a. is the weight for each objective i, and fix) is 
the t objective value of solution x, and x is in the set X of all solutions considered. 
The selection probability function is equal to 

ifix)-f*) 

I ifix)-f*) 

where / * is the minimum fitness value in each generation. With these variable 
weights, MOGA is essentially searching in random directions. The authors argued 
that the method generally produces more diverse non-dominated solutions than 
VEGA, providing a broader choice of solutions to the decision maker. They applied 
MOGA to solve a multiple criteria flow shop scheduling problem with two and three 
objectives (makespan, total weighted tardiness, and total weighted completion time). 
The solution quality of a set of near Pareto-optimal solutions generated by MOGA is 
compared with that of VEGA. The authors showed that MOGA generates better 
Pareto fronts than VEGA by visual comparison. However, as pointed out by Coello 
[3], MOGA has the disadvantage of missing non-supported solutions of the efficient 
frontier since the weighted sum of objectives is used as a fitness function. Also, 
Cochran et al. [2] pointed out that it is focusing on the generation of diverse solutions 
but does not consider the convergence of the solutions; hence it may be time 
consuming to obtain good approximate solutions. 
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MPGA is a hybrid genetic algorithm to solve multiple criteria parallel machine 
scheduling problems with sequence dependent setups. The genetic algorithm is used 
to assign jobs to machines, after which dispatching rules such as setup avoidance and 
apparent tardiness cost with setups are used to sequence the jobs assigned on the 
individual machines [11]. The method combined an aggregating function method and 
the VEGA method. The motivation of MPGA is to achieve diversity and convergence 
of solutions by using two phases. In [2], six aggregating functions were tested that 
was focused on generating good compromised solutions. In the first phase, a 
multiplying function/(x) was selected through experiments. The aggregating function 
fix) is defined as 



/w=n^ 

'■=1 fi 



(5) 



where /fxj is the fitness value of the solution x,f(x) is t objective value of solution x, 
and f* is the minimum value of the t objective in each generation. Then, the 
selection probability function, which is equal to 



I if^-f(x))^ 

xeX 

where /" is the worst fitness value in each generation, is used to select chromosomes 
for a crossover operation in that generation. In the second stage, MPGA adopts the 
approach of VEGA to increase diversity of solutions from good compromised 
solutions. To do this, the solutions of the first stage are rearranged and divided into p 
+ 1 sub-populations (p for each objective and one for the combined objective 
function), which are the initial populations of the second stage. Each sub-population 
evolves separately (similar to the VEGA approach). Cochran et al. [2] also sought to 
find the best time to change between the two stages (called the turning point). 
Through experimentation, they determined that a fixed turning point of 2000 
generations work well [2]. MPGA outperformed MOGA when compared using the 
measures of 1) number of Pareto-optimal solutions and 2) number of combined 
Pareto-optimal solutions [2]. 

In both MPGA and MOGA, a tentative set of non-dominated solutions is stored 
and updated at every generation. A certain number of individuals randomly selected 
from the tentative set of non-dominated solutions are preserved as elite individuals. 
After the stopping criterion is met, a final set of non-dominated solutions remains. 



5 Experiments and Results 

In this section we provide a description of the parameter settings used for our 
experiments, the problem instances generated for testing, and the performance 
measures evaluated. We then present the results of our analysis. 
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5.1 Parameter Settings for Each Algorithm 

In [2], preliminary experiments were performed to find the best parameter settings for 
both algorithms, since the performance of each algorithm is dependent on the 
parameter settings used. We use the same parameter settings as in [2], listed as 
follows: 

• Crossover probability: 0.6 

• Mutation probability: 0.01 

• Population size: 20 

• Elitism: three elite solutions are selected from the tentative set of non-dominated 
solutions 

• Stopping criteria: 5000 generations 

For MPGA, the turning criterion is set at the 2000"’ generation. After the turning 
criterion has been reached, the population is divided into three suh-populations, one 
for each of the two objectives and one for the combined objective function. 



5.2 Problem Instances 

As shown in Table 1, four factors are used to generate problem instances with 100 
jobs each. A total of 36 sets of problems are generated using these four factors. Ten 
problem instances are generated randomly in each set, resulting in 360 test problems. 
All problems are replicated 10 times due to the inherent randomness of the genetic 
algorithms, yielding 3600 total instances. 



5.3 Performance Measures 

Our purpose is not only to compare the solution quality of two competing algorithms 
to solve a strongly NP-hard problem, but also to compare the results by different 
measures and provide evidence of the effectiveness of the ICP functional. Since no 
efficient algorithms to generate the true efficient frontier for the considered 
scheduling problem exist, measures based on this [4, 5, 6, 15] are not applicable, 
whereas the ICP measure has been developed based on this exact premise. The 
following two types of ICP measures are used in this study: (1) ICP with non-scaled 
objective values (ICP_U) and (2) ICP with scaled objective values (ICP_U_S). For 
analysis purposes, the comparison results given by [2] are provided in the same table. 
When using ICP measures, we assume that the decision maker’s value function can be 
represented as a convex combination of objective values (even if the exact weight for 
each objective is not known) and the decision maker’s weight for each objective is 
uniformly distributed. We note, however, that the use of the ICP measure is not 
restricted by the assumption of weighted sum scalarization; the measure can be used 
with any type of fitness value structure. 
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Table 1. Four factors and levels to generate 36 (2^x 3’) problem sets (Adapted from [2]) 



Factors 


Levels 


Description 


Range of 
weights 


1 (Narrow) 

2 (Wide) 


U(l,10) 

U(l,20) 


Range of due 
dates 


1 (Narrow) 

2 (Wide) 


Release time + U(-l,2) x total process time. 
Release time + U(-2,4) x total process time. 


Ratio ( ^/- ) 
/ S 


1 (High) 

2 (Moderate) 

3 (Low) 


50/10, p = 50 + U(-9,9), 5 = 2* U(3,7). 
30/30, p = 30 + U(-9,9), i = 6* U(3,7). 
10/50, p = 10 + U(-9,9), s = 10* U(3,7). 


WIP status 


1 (High) 

2 (Moderate) 

3 (Low) 


All jobs are ready at time 0 

50% of jobs are ready at time 0 and the others 

are ready at time U(0,720) 

All jobs are ready at time U(0,720) 



Notes) 

U(a, b): random number generated from uniform distribution between a and b 
p : average process time. 



s : average setup times. 

Total process time: process time plus average setup times. 

Average setup times of job j: (1/m)x(L5,j), where s.^ is setup times from job with family i to job 
with family k, i is the family of joh j, and k = l,2,...,n. 



5.4 Experimental Results and Analysis 

In this section, we provide comparison results of the ICP measure and other measures 
in the literature first. We then present the results and analysis of the comparison of the 
two genetic algorithms. 

5.4.1 Comparison of ICP and Other Measures 

Here we provide a summary of the results presented in [7] on the comparison of the 
ICP measure to other measures in the literature. We believe that the results of any 
comparison method should be consistent with the results of visual comparison on 
clear instances. 

Consider the following 10 replicates (i.e., solutions using different seed values) 
using each of the two heuristics, shown in Figure 2. Set 1 and set 2 represent 
approximations of the efficient frontier generated by MOGA and MPGA, 
respectively. Based on a visual comparison, we can conclude that set 1 is better than 
set 2 in replicates 5, 6, and 9; set 2 is better than set 1 in replicates 1 and 4; these 
replicates will be referred to as “clear cases ”. In replicates 2, 3, and 7, set 1 has better 
solution points in the tail areas (for each objective) while set 2 has better solution 
points in the elbow area (for compromised objectives). In replicate 8, set 1 has better 
solutions for objective 2, and in the elbow area, set 1 and set 2 have similar solutions 
for objective 1, so it is hard to say which solution set would be preferable. In replicate 
10, the solution sets are very close, resulting in a (visual) tie. Hence, replicates 2, 3, 7, 
8, and 10 will be referred to as “unclear cases”. 




480 W.M. Carlyle et al. 



Table 2 summarizes the results of comparing a visual method, the ICP measure, 
and measures that count the number of Pareto-optimal solutions (# of POS) and the 
number of Pareto-optimal solutions from that algorithm when the two solution sets 
are combined (also called number of combined Pareto-optimal solutions, or # of 
CPOS). The symbol ‘?’ is used to indicate that the result of the visual comparison is 
not clear. Note that sets of solutions with high values of the two cardinality measures 
(# of POS and # of CPOS) and low values of the ICP measure are preferred. The 
preferred solution set found by each comparison method is marked in bold. 



Replicate 1 Replicate 2 Replicate 3 Replicate 4 Replicate 5 



^ ^ ^ ^ 

1 1 1 1 

— 1 1- — 1 1 

1 1 1 1 
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Replicate 6 Replicate 7 Replicate 8 Replicate 9 Replicate 10 



^ ^ ^ 
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1 1 1 1 


1 1 1 1 


1 1 ^ ^ 

1 1 1 1 


1 f 1 ^ 

_ 1 1 1 1 
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1 1 1 1 


^ 1 1 1 1 


1 1 1 1 


1 1 1 1 




1 1 1 1 
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Fig. 2. Sets of near Pareto-optimal solutions generated by the two algorithms for 10 replications 
of one specific problem instance. represents set 1, ‘o’ represents set 2 and each grid is 0.2. 
(Adapted from [7]) 

An analysis of Table 2, as in [7], shows that the two cardinality-based measures 
have pretty clear shortcomings. They can miss the general structure of the solution 
set, and, especially in the case of # of POS, can favor a solution set that isn’t 
necessarily closer to the true efficient frontier. 

In all of the clear cases, ICP gives results that are identical to those with visual 
comparison. For replications that yield a tie with the visual comparison (e.g.. 
Replication #10), the ICP measures of set 1 and set 2 are very close. Also, from the 
fact that the minimum value of ICP is 0.05, the best set of near Pareto-optimal 
solutions among the 20 can be easily found in Table 2 (see the graph of set 1 in 
replicate 9 in Figure 2). In the not-so-clear cases, set 1 wins in replicates 8, 10 and set 
2 wins in replicates 2, 3, and 7 according to the ICP measure. 

From above results, we can see that the ICP measure provides the closest comparison 
result to the visual comparison, with a scalar value. One shortcoming of ICP measure 
presented here is that it uses only efficient extreme points in evaluating a set of 
Pareto-optimal solutions, since it assumes the decision maker’s value function is a 
convex combination of objective functions. In this case, ICP cannot differentiate sets 
of Pareto-optimal solutions, which have the same efficient extreme solutions and 
different non-supported solutions. This limitation can be overcome by using different 
types of value functions. For example, if a weighted Tchebycheff metric is assumed, 
then all Pareto-optimal solutions in a set will be considered in evaluating the solution 
quality of sets of near Pareto-optimal solutions. 






Comparison of Multiple Objective Genetic Algorithms 48 1 



Table 2. Numerical comparison of solution set 1 and set 2 (Adapted from [7]) 



Rep. # 


Visual 

Compariso 

n 


# of POS 


# of CPOS 


ICP 


ICP- 

difference 


set 1 


set 2 


set 1 


set 2 


set 1 


set 2 


1 


Set 2 


14 


9 


2 


9 


0.21 


0.17 


0.04 


2 


? 


21 


5 


15 


4 


0.17 


0.16 


0.01 


3 


? 


22 


8 


10 


7 


0.15 


0.12 


0.03 


4 


Set 2 


18 


9 


1 


9 


0.21 


0.10 


0.11 


5 


Set 1 


11 


12 


11 


0 


0.08 


0.26 


-0.18 


6 


Set 1 


14 


13 


14 


0 


0.14 


0.18 


-0.04 


7 


? 


12 


7 


7 


5 


0.15 


0.13 


0.02 


8 


? 


9 


8 


8 


4 


0.12 


0.16 


-0.04 


9 


Set 1 


10 


10 


10 


2 


0.05 


0.15 


-0.10 


10 


? 


11 


12 


10 


5 


0.20 


0.21 


-0.01 


Wins 


- 


7 


2 


8 


2 


5 


5 


5 



5.4.2 The Use of ICP to Compare MPGA and MOGA 

Our experimental results are summarized in Table 3, where the column ‘Problem 
instance set’ represents the combination of the levels of the four factors. Values in 
both ‘ICP_*’ columns are the number of wins of MPGA (over MOGA) out of 100 
comparisons (10 randomly generated problems * 10 replicates) in the corresponding 
problem sets using the corresponding ICP measures. In the ‘# Pareto- optimal’ 
column, the average number of non-dominated solutions generated by MPGA and 
MOGA are shown respectively. In the ‘# combined Pareto’ column, the average 
combined number of non-dominated solutions generated by MPGA and MOGA are 
shown. In the ‘win’ column, ‘ 1 ’ means MPGA wins over 50 times, ‘0’ means MOGA 
wins over 50 times, and ‘0.5’ means the algorithms are tied. In the ‘Total’ row, values 
in the two ‘ICP’ columns indicate the number of wins of MPGA out of 3,600 
comparisons, and values in the four ‘win’ columns indicate the number of wins of 
MPGA out of 36 problem instance sets. Finally in the ‘Ratio’ row, the values given 
are the ratio of the number of wins of MPGA out of the total number of comparisons. 

We can first see that MPGA outperforms MOGA in overall performance, since all 
six ratios in Table 3 are greater than or equal to 0.5. However, the number of wins for 
MPGA is much different for the various measures. When the number of Pareto- 
optimal solutions measure is used, MPGA wins in all of the problem sets, but when 
the combined number of Pareto-optimal solution measure is used MPGA wins in 32 
problem sets. These indicate that the VEGA method embedded in the MPGA plays a 
role in increasing the number of non-dominated solutions. When ICP_U is used, 
MPGA wins in 32 of the problem sets, and when ICP_U_S is used MPGA wins 19 
times out of 36. 

These results indicate that the problem instances employed in this study have 
scaling differences. For example, the makespan objective values range from 1,000 to 
1,300, the total weighted tardiness objective values range from 200,000 to 350,000 for 
problem instance set 2212. Hence, when ICP_U is used, the narrow ranged objective 
(makespan) is nullified by the wide ranged objective (total weighted tardiness). We 
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can see that MPGA tends to generate solutions that are better in the total weighted 
tardiness objective. Also, as indicated in [7], ICP_U_S gives results that are closer to 
results from visual comparisons than the other measures. Hence, further analysis is 
performed using the results by ICP_U and ICP_U_S. 



Table 3. Number of wins of MPGA out of 100 comparisons in each 36 problem instance set 



Problem 

instance 

set 


ICP comparison results 
ICP_U win ICP_U 


win 


1111 


62 


1 


_S 

50 


0.5 


1112 


67 


1 


61 


1 


1113 


81 


1 


77 


1 


1121 


55 


1 


47 


0 


1122 


65 


1 


56.5 


1 


1123 


66 


1 


62 


1 


1131 


52 


1 


46 


0 


1132 


44 


0 


42 


0 


1133 


58 


1 


44 


0 


1211 


59 


1 


53 


1 


1212 


70 


1 


62 


1 


1213 


84 


1 


69 


1 


1221 


57 


1 


45 


0 


1222 


73 


1 


60 


1 


1223 


71 


1 


55 


1 


1231 


52 


1 


49 


0 


1232 


54 


1 


49 


0 


1233 


48 


0 


38 


0 


2111 


58 


1 


53 


1 


2112 


76 


1 


71 


1 


2113 


72 


1 


62 


1 


2121 


64 


1 


54 


1 


2122 


74 


1 


62.5 


1 


2123 


63 


1 


55 


1 


2131 


55 


1 


41 


0 


2132 


51 


1 


47 


0 


2133 


62 


1 


44 


0 


2211 


53 


1 


51 


1 


2212 


82 


1 


73 


1 


2213 


80 


1 


74 


1 


2221 


59 


1 


49 


0 


2222 


73 


1 


65 


1 


2223 


60 


1 


49 


0 


2231 


45 


0 


45 


0 


2232 


45 


0 


41 


0 


2233 


59.5 


1 


39.5 


0 


Total 


2249.5 


32 


1941.5 


19.5 


Ratio 


0.62 


0.89 


0.54 


0.54 



Comparison results by Cochran et al. [2] 



# Pareto-optimal 


# combined Pareto 


MP- 


MO- 


win 


MP- 


MO- 


win 


GA 


GA 




GA 


GA 




11.9 


9.4 


1 


6.6 


5.2 


1 


12.0 


9.5 


1 


8.2 


4.2 


1 


10.7 


8.4 


1 


7.9 


2.5 


1 


11.7 


8.8 


1 


6.0 


5.4 


1 


5.7 


4.6 


1 


3.5 


1.9 


1 


7.0 


5.4 


1 


4.4 


2.3 


1 


12.6 


8.6 


1 


5.8 


5.6 


1 


1.8 


1.2 


1 


0.6 


0.7 


0 


2.9 


1.9 


1 


1.2 


1.1 


1 


12.0 


9.7 


1 


7.3 


4.7 


1 


14.1 


9.5 


1 


9.5 


4.0 


1 


10.8 


8.9 


1 


7.9 


3.0 


1 


11.4 


9.2 


1 


6.0 


5.4 


1 


6.7 


4.8 


1 


3.9 


2.0 


1 


7.9 


5.3 


1 


5.0 


2.5 


1 


12.5 


9.5 


1 


5.7 


6.1 


0 


2.1 


1.3 


1 


0.8 


0.7 


1 


4.3 


1.9 


1 


1.5 


1.3 


1 


12.2 


8.9 


1 


6.6 


4.8 


1 


13.3 


9.4 


1 


9.7 


3.7 


1 


11.5 


8.8 


1 


7.1 


3.7 


1 


11.9 


8.7 


1 


6.7 


4.5 


1 


5.7 


4.8 


1 


3.7 


1.4 


1 


8.1 


5.4 


1 


4.5 


2.7 


1 


12.6 


9.3 


1 


6.0 


6.0 


0 


1.6 


1.3 


1 


0.7 


0.7 


1 


3.3 


2.0 


1 


1.3 


1.1 


1 


12.4 


9.3 


1 


6.9 


4.9 


1 


13.8 


9.4 


1 


10.7 


3.2 


1 


11.9 


9.2 


1 


8.9 


3.0 


1 


11.7 


8.6 


1 


6.5 


4.7 


1 


7.5 


5.2 


1 


5.0 


1.8 


1 


8.5 


5.6 


1 


4.7 


2.9 


1 


12.9 


9.6 


1 


6.1 


6.0 


1 


1.9 


1.3 


1 


0.7 


0.8 


0 


4.2 


2.0 


1 


1.8 


1.0 


1 


- 


- 


36 


- 


- 


32 


- 


- 


1.00 


- 


- 


0.89 



In Table 3, we see a clear trend. When the process time/setup time ratio (S"" factor) 
is ‘3’ (low), MPGA always loses. This indicates that the solution qualities of MPGA 
and MOGA depend on the factor levels. To perform the factor analysis. Table 4 is 
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derived from Table 3. As shown in Table 4, MPGA wins across all weight range and 
due date range factor levels. However, when the level of the process-setup ratio is ‘3’ 
and the WIP ratio factor is ‘ 1’ , we cannot say that MPGA outperforms MOGA. 



Table 4. Number of wins of MPGA by levels of the four factors 



Factor 


Level 


ICP_U 


Win 


ICP U S 


win 


Weight 


1 


1118 


1 


965.5 


1 


Range 


2 


1131.5 


1 


976 


1 


Due date 


1 


1125 


1 


975 


1 


Range 


2 


1124.5 


1 


966.5 


1 


Process- 


1 


844 


1 


756 


1 


Setup Ratio 


2 


780 


1 


660 


1 




3 


625.5 


1 


525.5 


0 


WIP Ratio 


1 


671 


1 


583 


0 




2 


774 


1 


690 


1 




3 


804.5 


1 


668.5 


1 



Note: In ‘win’ columns, ‘1’ means MPGA wins over 900 times out of 1800 comparisons in 
weight range factor and due date range factor rows and MPGA wins over 600 times out of 1200 
comparisons in the process-setup time ratio factor and the WIP status factor rows. 



For further analysis of the relation between the performance of MPGA and 
process-setup and WIP ratio factors, we develop Table 5 from the information in 
Table 3. As shown in Table 5, when the level of the process-setup ratio is ‘3’(low), or 
when the level of WIP ratio is ‘1’, it is difficult to say which one generates better 
Pareto fronts. MOGA outperformed MPGA when the level of process-setup ratio is 
‘3’(low) and the level of WIP ratio is ‘2’(moderate) in ICP_U_S measure. In all other 
cases, MPGA outperforms MOGA in both ICP_U and ICP_U_S measures. 

One of the reasons for this problem-dependent performance can be inferred from 
the algorithmic characteristics of MPGA. In MPGA, the population is divided after 
the specified turning criteria (2000“’ generation). After this point, the subpopulation 
evolves for the improvement of the objective function assigned to it. This is one of the 
reasons that MPGA outperformed MOGA in most cases under the cardinality 
measure. However, once one objective reaches the optimal or near optimal solution 
before the stopping criterion of the algorithm (the 5,000“’ generation), then the sub- 
population assigned to that objective has less chance to improve the solutions. This 
happens, for instance, when one objective is significantly easier to optimize than the 
others. 

As shown in Table 1, when the level of process time and setup time ratio is 3(low), 
the process times of 100 jobs are randomly generated from U(l,19) and the setup 
times are generated from U(30,70). In this case, the makespan objective is much more 
dependent on the setup times than process times. As stated before, there are 4 families 
of jobs and 5 identical machines. Thus, near optimal schedules for the makespan 
objective can be obtained easily (i.e., assigning the jobs with the same family to the 
same machine). Once a job sequence that satisfies this condition approximately is 
determined by the genetic algorithm (crossover or mutation operation), then there is 
not much room for improving the solutions. There is then more of a chance that the 
makespan objective reaches a (near) optimal solution within fewer generations for 
level 3 than for level 1 or level 2 of the ratio of processing time to setup time. 
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Similarly, when the level of WIP is l(high), the release time (r) of all 100 jobs is 0. In 
this case, the makespan objective without release times seems to be easier than with 
release times. 



Table 5. Number of wins of MPGA by levels for sensitive factors 



Process- 
Setup time 


WIP 

Ratio 


ICP_U 


win 


ICP_U_S 


win 


Ratio 

1 


I 


232 


1 


207 


1 


1 


2 


295 


1 


267 


1 


1 


3 


317 


1 


282 


1 


2 


I 


235 


1 


195 


0 


2 


2 


285 


1 


244 


1 


2 


3 


260 


1 


221 


1 


3 


I 


204 


1 


181 


0 


3 


2 


194 


0 


179 


0 


3 


3 


227.5 


1 


165.5 


0 



Note: In ‘win’ columns, ‘1’ means MPGA wins over 200 times out of 400 comparisons. 



6 Conclusions and Future Research 

We have provided a detailed, real-world example that illustrates the use of a new 
quantitative measure for comparing heuristic algorithms for multiple objective 
combinatorial optimization problems. Two competing genetic algorithms (MPGA 
and MOGA) are compared by two ICP measures for 360 problem instances of bi- 
criteria scheduling problems. Overall, both ICP and cardinality measures indicate that 
MPGA generates better sets of approximate solutions than MOGA. Also, we can see 
that the solution qualities of both algorithms are dependent on the problem instances. 
Even though MPGA generates a larger number of non-dominated solutions in most 
cases, MOGA can work better than MPGA in terms of the scaled ICP measure when 
one objective can be optimized more easily than the other objective. Because MPGA 
may waste sub-populations assigned to improve an objective function for which a 
near optimal solution has already been found. 

In developing multiple objective genetic algorithms, parameter tuning and 
choosing a selection scheme are often performed for a limited number of problem 
instances. From our experimental results, we conclude that preliminary experiments 
on a wide range of problem instances are needed since different parameters can 
perform better for different problem instances. Hence, automatic (dynamic) parameter 
setting and automatic choosing of a selection scheme depending on the problem and 
problem instances in a computerized decision support system seems to be a promising 
avenue of research. However, reliable and easily obtainable performance measures are 
required to do this. 

Further research in this area is expected to be fruitful. One example is that various 
selection schemes reviewed in [3] need to be compared for different practical multiple 
objective optimization problems under measures that have appeared in literature. To 
do these experiments and to tune parameters in each setting, experimental design and 
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response surface optimization methodology [10], which are frequently used in 
stochastic process design and optimization, will likely be extremely useful. This study 
will build intuition about the strengths and weaknesses of various selection scheme 
and solution quality measures. 
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Abstract. In this communication a bi-criterion approach for the nominal Air- 
lines Crew Rostering Problem is developed. The nominal Crew Rostering 
Problem considers the assignment of the crew staff to a set of pairings covering 
all the scheduled flights so that operations costs are minimized while its solu- 
tion must meet hard constraints resulting from the safety regulations of Civil 
Aviation as well as from the airlines internal agreements. Another goal is of the 
highest interest for airlines: since the overall satisfaction of the crew staff may 
have important consequences on the quality and on the economic return of the 
operations. 

In this communication, a new mathematical formulation of the crew scheduling 
problem which takes into account the satisfaction of the crew members is pro- 
posed. A heuristic approach, combined with a genetic algorithms technique, is 
adopted to produce reduced cost solutions associated to acceptable satisfaction 
levels for the crew staff. The application of the proposed approach to a medium 
size Airline Crew Rostering Problem is evaluated. 

Keywords. Multi-Criterion Optimization, Heuristics, Genetic Algorithms, Air- 
lines Operations, Crew Scheduling, Crew Rostering. 



1 Introduction 

For more than three decades now the Airlines Crew Scheduling Problem (ACSP) has 
retained the attention of the Management and Operations Research community since 
crew costs in air transportation are extremely high, amounting 15-20% of total airlines 
operations costs. Therefore, airlines consider that the efficient management of their 
crew staff is a question of the highest economic relevance. Unfortunately, the exact 
numerical solution of the associated large scale combinatorial optimization problem is 
very difficult to obtain. Early rules of thumb [13] have been quickly overrun by the 
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size of the practical problems encountered (hundreds or thousands of crew members to 
be assigned to at least as many pairings) and by the complexity of the set of con- 
straints to be satisfied, leading very often to poor performance solutions. More re- 
cently, with the enhancement of computer performances, optimization approaches 
have been proposed to solve this problem : mathematical programming methods (large 
scale linear programming and integer programming techniques) (see [3], [4], [9], 
[10]), artificial intelligence methods (logical programming, simulated annealing, neu- 
ral networks, fuzzy logic and genetic algorithms) as well as heuristic approaches and 
their respective combinations [11], [12], [15], [20]. Many studies refer to the nominal 
ACSP which is a static decision problem, based on a monthly table of flights, and 
devoted exclusively to the minimization of airlines operations costs. This problem is 
in general split in two sub problems: a crew pairing problem where the set of pairings 
covering the programmed flights is defined and a crew rostering problem where the 
retained pairings are nominally assigned to the crew staff. 

In this paper, after discussing the airlines crew scheduling problem, the nominal 
Airline Crew Rostering Problem (ACRP) is introduced as a bi-criterion decision 
problem where the main decision criterion is the crew operations cost of the airline 
and the secondary decision criterion is relative to the crew staff overall degree of 
satisfaction. The solution approach proposed here is composed of two steps: in the 
first one a heuristic approach is designed to get a first set of high satisfaction assign- 
ment solutions and in the second one, an optimization process, based on genetic algo- 
rithms (GA) is developed. The application of this solution approach to a medium size 
ACRP is displayed. 



2 The Airline Crew Scheduling Prohlem 

The Airline Crew Scheduling Problem (ACSP) is treated in general once the schedule 
of the flights has been established for the next month and once the available fleet has 
been assigned to the scheduled flights. Two classes of constraints are considered in 
order to produce the “line of work” for the crew staff over the planning period : hard 
constraints whose violation impair the security of the flight (crew qualifications, na- 
tional regulations concerning duration of work and rest times, medical clearances, 
training and license renewal requirements) and soft constraints (internal company 
rules, agreements with unions regarding the crew’s working and remuneration condi- 
tions, office duties, holidays and declared assignment preferences by the crew staff) 
which are relevant to build the crew schedule but whose relaxation may lead to lower 
cost solutions. While some of these soft constraints are common to most airlines, 
others are only relevant for some classes of airlines and some few are specific to a 
given airline. The primary objective sought by airlines at this level of decision making 
is to minimize the crew related operations costs, so in most research studies, the ACSP 
has been formulated as a mono-criterion minimization problem. 

A sub optimal but widely accepted approach to tackle more efficiently the ACSP, 
which is of the NP-hard computational complexity class [8], consists in decomposing 
it in two sub-problems of lower difficulty. The first sub-problem, the Airline Crew 
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Pairing Problem (ACPP), involves the construction of an efficient set of pairings (a 
pairing is a sequence of flights which starts and ends at the same airline base while 
meeting all relevant legal regulations) which covers the whole programmed flights. 
The second sub-problem, the Airlines Crew Rostering Problem (ACRP), considers the 
nominal assignment of the airline crew to the generated set of pairings over the 
planned period so that an effective “line of work” is obtained for each staff member 
(Fig. 1). 

To get in a simple way a solution to the ACRP, most North American airlines have 
adopted in a first place some heuristics such as bidding processes where the crew staff 
is arranged by decreasing seniority and each crew member builds at his turn his own 
monthly “line of work” from the remaining pairings. This greedy heuristic approach 
generates too often uneven workloads and so induce repeated dissatisfaction among 
the crew staff. More recently, global approaches, based on Mathematical Program- 
ming techniques, have been proposed to tackle the ACRP [9]. 




Crew Rostering Problem 



Fig. 1. The Airline Crew Scheduling Problem 



3 A Mathematical Programming Approach of the Nominal ACRP 

In this section a standard formulation of the ACRP is analyzed and the direct intro- 
duction of a crew satisfaction index is considered. 
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3.1 Analysis and Formulation of the Nominal ACRP 



The nominal Airline Crew Rostering Problem has been formulated 
ger mathematical programming problem [1], [13] where the crew 
the criterion to be minimized under a finite set of hard constraints: 


as a zero-one inte- 
operations cost is 


Minimize ^ ^cgxg 
je.Jie.Aj 




Subject to 
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ieAj 
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where eg is the cost resulting from the assignment of pairing “j” to pilot “i”, I is the 

set of the n pilots, J is the set of the “m ” pairings to be covered during the planning 
period, Aj is the set of the pilots able to fly pairing “j”, Oj^ is the set of the pairings 

overlapping with pairing “ ji ”, d j is the amount of flying hours associated to pairing 
“j”, Xjj are binary variables such that xg =1 if pairing “j” is assigned to pilot “i”, 
Xij =0 otherwise. 

The first set of constraints ensures that to each pairing is assigned a unique crew, 
the inequality sign allowing crew deadheading (a transfer of crew members out of duty 
to another base in order to carry out a planned flight). The second set of constraints 
ensures that the same crew is not assigned to two overlapping pairings. The third set 
of constraints ensures that the number of hours flown by a pilot during the rostering 
period (a month in general) does not exceed an upper limit LH and in the fourth set of 

constraints, /^^ax '^he maximum number of pairings that can be assigned to crew 
member “i” over a rostering period. 
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Some important comments can be done: 

The cost resulting from the assignment of a pairing to a given crew member can 
not be determined with precision before the whole workload of this crew member 
has been defined. In many airlines, crew payment is computed in a complex way 
from many parameters such as total amount of flown hours, standby duties and 
overtime. So, it appears that the above formulation can result in a poor approxi- 
mation of the real assignment problem. 

Other relevant constraints can be added as soft constraints to the mathematical 
formulation of the Airline Crew Rostering Problem. For instance, it has appeared 
important to many airlines to integrate in the ACRP the preferences of the crew 
staff. Effectively, in most airlines the crew staff can declare their preferences to 
carry out some pairings over the nominal rostering period, even if the final as- 
signment decision is left to the airline operations management. Then, in addition 
to the minimum cost objective, it seems interesting to introduce a crew overall 
satisfaction objective either as a sub criterion or as a soft constraint. 

Exact solutions for the resulting large scale combinatorial optimization problems 
are not available in an acceptable computing time and in general, airlines have pro- 
duced their crew schedules using simplistic heuristics. However during the last dec- 
ade, a number of major airlines have developed optimization-based techniques to 
solve the ACRP: improved heuristics based on column generation techniques and 
exact solution approaches based on constraints logic programming [8], [20]. 



3.2 Introduction of the Crew Satisfaction Level 

For an airline, it is not easy to adopt a global index representative of the overall crew 
satisfaction level, the latter being composed of a large number of individual evalua- 
tions over different elements such as total flown time, total standby time and satisfac- 
tion of pairing preferences over the current and the last planning periods. A standby 
duty occurs when a pilot has to remain ready for the possible replacement of other 
unavailable crew members. The number of standby duties assigned to a given crew 
member over a planning period should remain small since this situation is not attrac- 
tive, its hourly rate much lower than for the flights. 

However, adopting a simplified approach, if it is possible to represent by a real 
number Sy the increment of satisfaction of crew member “i” when he is assigned his 

requested pairing and by a real number Sy the increment of satisfaction of crew 

member “i” when he is not assigned pairing “y” which he wants to avoid. Then the 
current degree of satisfaction of crew member “i” can be given by: 

“ ^ {^ij ^ij ^ij “ ^ij )) 

j^J 



DOSi 
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Then it could be possible to introduce new restrictions to the nominal formulation of 
the ACRP, such as: 

DOSi [xij ,jeJ]> £>C>5 ““ V t g / 

to ensure a minimum degree of satisfaction for each crew member, and such as: 

TTii^ * DOSj[xij , j e J]> DOSj^ixj^j , j e J] \/ i,ke I,i^k (8) 

where is a relative seniority index, to ensure equity, tempered by seniority, be- 
tween the different crew members. 

This approach, although straightforward, increases notably the difficulty of the 
ACRP by introducing a large number of constraints in its formulation and needs the 
definition of many satisfaction-related parameters. 

A first bi-criterion solution approach can adopt the following steps : 

1) Solve by an approximate method the ACRP with restrictions (1) to (5), let 
Cmin be the resulting minimum operations cost. 

2) Choose a real number /I , superior to unity. 

Solve the following optimization problem: 

r 

Max ^ Wi ^ (5 y Xij + S ij (1 - Xij )) 
ie.1 jeJ 

subject to constraints (1) to (5) and 

^ — '^^min (9) 

7'e J ie Aj 

where the w,' are positive normalized seniority weights. 

3) If this last constraint is active at solution, evaluate accurately the degree of 
satisfaction of each crew member and if too many crew members present a 
non adequate degree of satisfaction, increase parameter /I and comeback to 
step 3, otherwise, stop. 

The main difficulties present in this first bi-criterion approach are related with the 
quantification of the satisfaction degrees (values of the Sy and Sy , analytical 
expression for the DOSi ) *^be choice of the seniority weights which can 
hardly be non suggestive. 
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4 A New Solution Approach for the Nominal Airline Crew 
Rostering Problem 

It has been observed in section 3 that the cost structure of the ACRP is not separable 
with respect to pairings since to estimate accurately the operations cost associated 
with a given crew, his entire “line of work” for the next planning period must be 
known. Quite the same can be said about the crew individual and global satisfaction 
levels over a planning period. At this point, Genetic Algorithms techniques [15], [16], 
[17], [18] which manipulate complete solutions, appear interesting. [19] have used this 
property of AG to propose a new solution approach for the ACPP. Recall that the 
solution search process of GA is based on the stochastic improvement of a set of ini- 
tial solutions (“the initial population”) through the use of operators recalling the se- 
lection and evolution processes of natural species. 

Here the ACRP is considered to present a main criterion, the airline operations 
cost, and a secondary criterion, the overall satisfaction degree of the staff. So, in order 
to generate a representative set of near Pareto solutions, it is proposed to start from a 
population of solutions obtained from a heuristic whose aim is to maximize the overall 
degree of satisfaction regardless of the operations cost, then, genetic techniques are 
used to generate new solutions sets with reduced operations cost over a sequence of 
« generations ». During this process, the levels of the degrees of crew satisfaction of 
the solutions composing the current generation suffer some abatement. The three 
mains stages of the proposed approach are showed in (Fig. 2). 




^ Yes 



Final selection of a non-inferior crew rostering solution 



Fig. 2. The proposed GA-based solution approach 
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4.1 A Greedy Heuristic Approach to Build the Initial Population 

Once a representative index of the satisfaction of each crew member has been made 
available [22] for an approach based on fuzzy set theory), the initial solutions set can 
be built from a unique greedy heuristic technique (the Crew Satisfaction Heuristic- 
CSH) applied to different arrangements of the set of crews. Note that the satisfaction 
of the crew builds up also from past assignments to different activities. It is supposed 
here that at the beginning of a new planning period each crew is characterized by a 
degree of satisfaction belonging to a qualitative scale such as 

^eryLow, Low, Medium, Fair, High,VeryHigh] . With the proposed greedy heuristic, 
the preferences of crew members of lowest satisfaction degree are considered first. 
This heuristic technique is divided in three stages. In the first stage, the crew staff of 
lowest satisfaction are assigned, when possible, to their requested pairings. In the 
second stage, the remaining rejected pairings are assigned to the crew staff who do not 
bother with them. However, if all remaining crew members wish to avoid some 
pairings, then the crew members with the highest satisfaction level will have to cope 
with them. Finally, the rest of the remaining pairings is assigned to the crew members 
in order to complete in a balanced way their “line of work”. 

To build the initial population this process must be repeated until the desired size 
of the population has been reached. Since in each satisfaction class ‘i’ of cardinal 
“ ni ”, there are n,- ! different arrangements and since “ n,- ” can be high, the number of 

different arrangements can be an extremely high number sometimes, the 

heuristic assignment process produces identical assignment solutions, it is easy to find 
by random another arrangement in each satisfaction class, so that a new initial 
assignment solution is obtained. 

Since the set of pairings can be structured as a directed graph when temporal 
precedence, reachability and overlapping constraints are considered (see table 1 where 

X and represent the starting and the completion times of pairing A), each 
element of the initial population can be represented by a set of “n ” independent paths 
in the corresponding pairings precedence graph (PPG). 



Table 1. Precedence and overlapping relations 



Relation 


Temporal conditions 


A before Y 


A“ A A+ A T“ A T+ 


X meets Y 


A“ A A+ =T“ aT+ 


X overlaps Y 


A“ A A+ aT+ 


Y finishes A 


A“ aT“ A A+ =y+ 


Y during X 


A“ aT“ aT+ a A + 


X starts Y 


A“ =7“ A A+ aT+ 


X equals Y 


A“ =7“ A A+ =7+ 




494 W. El Moudani et al. 



4.2 A New GA Solution Strategy for the Nominal Airline Crew Rostering 
Problem Using GA 

Genetics Algorithms process sets of feasible solutions, called “populations” where 
each element of these populations is similar to a chromosome composed of genes, 
each gene corresponding to a particular parameter of the problem. The chromosomes 
can be represented using either a binary or a non-binary codification. 

For the problem tackled here, a non-binary codification has been adopted for the 
representation of the solutions composing a given population. The i* component of a 
« chromosome » indicates which crew member is assigned to the i* pairing (Fig. 3). 
The size of a population considered in the case studied here is equal to 30 chromo- 
somes. 




Identification of crew members 



Fig. 3. The adopted coding for an ACRE solution 

This codification has been chosen in order to minimize the memory requirements to 
codify a whole population, however, it is straightforward to associate hy inspection, to 
each of these chromosomes, a set of “n ” independent paths in the PPG (Fig. 4). 

An operator select the chromosomes which become parents according to their 
evaluation values. A roulette wheel method picks two chromosomes to which classical 
GA operators like crossover, inversion and mutation are applied according to a chosen 
probability. 

Classical genetics operators (crossover, mutation and inversion) have been adapted 
to the context of the present assignment problem to produce in a progressive way 
improved new generations. Each elementary genetic operation processes indirectly the 
pairing assignments of two different pilots. When a noticeable improvement of the 
operations cost is obtained, the local modification is retained, but when local solution 
costs are equivalent, the respective degrees of satisfaction of the two pilots are taken 
into consideration to make the choice. 

The stopping rule adopted here is based not only, as usual with GA, on the allowed 
maximum number of non-improving successive generations, but also on the propor- 
tion in the current population of the cost effective solutions for which too many indi- 
vidual degrees of satisfaction are insufficient (“Low” or “Very Low”), since these cost 
effective solutions will represent a high risk for the airline. 
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Fig. 4. Feasibility of the assignment solutions 



4.3 Problem-Specific Mutation and Inversion Operators 

In order to speed up the discovery of the most promising assignment solutions, a local 
heuristic technique has been introduced to restrict the search space of the mutation and 
the inversion processes. It consists in dividing the crew members in two sets, 
and WLjj , corresponding to workloads less and higher than the current average 
workload which is computed for the whole staff. 

With respect to the mutation operator, a position “ k ” is chosen at random in the 
chromosome, the crew member “ x ” found in the position “ k ” will be replaced by 
another crew member “ y ” chosen randomly in set WLj^ and for which this new task 
is feasible (the precedence constraints in the PPG must be checked for that). The idea 
is here to contribute actively to a more balanced workload over the whole staff. Then 
the solution under treatment by the mutation operator suffers the following modifica- 
tion: 

V V £ WLi^ U WL then 3 y £ / SOL[k^ = y with 

k yt r T (10) 

WL^=WL^-dr ,WLy =WLy+dr Vfc£ jl,--- A|, x A y 

where SOLlk} = y means that the assignment solution array SOL contains the crew 
member “ y ” in the position “ k ” and dr^ is the duration of the pairing “ k ”, i.e. the 

pairing “ k ” of duration dr^ is non assigned to crew member “ y ”. 

In the case of the inversion operator, a position “ i ” corresponding to pairing “ i ” is 
chosen randomly, if it is assigned to a crew member “x ” belonging to WLjyi then a 
crew “ y ” belonging to WLj^ is chosen at random, while a pairing “ j ” performed 
initially by crew member “ y ” is selected randomly. If its duration is less than the 
duration of pairing “ i ” and the inversion is feasible, then a new solution is produced 
where: 



SOL[i] = y and SOL[j] = x Vi, y £ {l, • • ■ n}, i A j,x^ y 



( 11 ) 






496 W. El Moudani et al. 



In the case where position “ i ” corresponds to a crew belonging to , the above 
process is inverted. 

The final assignment solutions are arranged in accordance with the criterion cost and, 
when equality of costs, with respect to an overall satisfaction degree criterion. 



5 Case Study 

This solution approach has been already applied to a medium size problem where 75 
crew members must be assigned to 275 pairings corresponding to a total amount of 
4250 flight hours. Some learning can be obtained from this first application with re- 
spect to the computer effectiveness of the proposed approach and with respect to the 
quality of the solution set obtained. 



5.1 Computer Effectiveness Results 

When applying the first step of the proposed approach, the satisfaction-based compu- 
tation of the initial population through the CSH greedy heuristic is immediate and 
requires a relatively short computing effort. In relation to the subsequent genetic algo- 
rithm, it appears that the different genetic operators do not present equivalent per- 
formances: the crossover operator, which is quite computer time consuming, does not 
contribute too much to produce new promising assignment solutions since the set of 
constraints to be checked is very large, while the mutation and inversion operators 
appear to be more efficient to generate new assignment solutions with relatively mod- 
erate computing times. The genetic operators are applied according to the probabili- 
ties = 0.20, = 0.40 and pi = 0.40 . 

To compare the efficiency of the different genetic operators combinations over the 
final cost effective solutions, some parameters have been introduced: the workload 
deviation index (JjyL *^he proportion of non satisfied crews . Here (T\Yp is 
given by: 

n 

^{WLi-WL)^ hi (12) 

i=l 

where WLi is the workload assigned to crew member “ i ”, WL is the average work- 
load for the rostering period and “ n ” is again the size of the staff. The comparative 
results are displayed in Table 2. 



(^WL = 
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Table 2. Effectiveness of CSH and genetic processes for the ACRP 





Minimum 

Cost 

Solution 


<^WL 

Cost Disper- 
sion 


Proportion of 
non satisfied 
crews ( Tj^s ) 


CSH Heuristic 


4312:30 


4:50 


0.13 


Crossover alone 


4304:15 


4:35 


0.25 


Mutation alone 


4290:30 


3:30 


0.49 


Inversion alone 


4291:00 


3:35 


0.42 


Crossover + Mutation 
- 1 - Inversion 


4278:30 


3:20 


0.48 



5.2 Analysis of the Assignment Solutions 

In figure 5, the total cost and the proportion of non satisfied crew distributions of the 
final population are displayed to provide an idea of the non inferior set of solutions to 
the considered problem. In figures 6. a and 6.b, the individual workloads and satisfac- 
tion degrees corresponding to its extreme solutions are displayed: Sj^ and Sfj . In 
figures 7. a and 7.b, the individual workload distribution generated after using CSH 
and GA are displayed. 




Total flying hours 



Fig. 5. Final solution set 



Level of satisfac- Level of satisfac- 

tion lion 



498 W. El Moudani et al. 



Lowest 




Average work- 


workload 




load 


- 


-i— H--f- + 


H — h 












Highest 


- 


+ 


workload 


- 


-H- -K H- ■+■ 


J 















□ ' a ' — ' ^ ' ' 

40 45 so 55 60 65 70 75 



Flying hours 

Fig. 6.a. Solution corresponding to Sfj 
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Fig. 6.b. Solution corresponding to 




Fig. 7.a. Example of workload distribution corresponding to the greedy heuristic 
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Fig. 7.b. Example of final workload distribution with GA (Sl solution) 



6 Conclusion 

In this communication, one of the main operations decision problem faced by airlines 
has been dealt with using Mathematical Programming and Computational Intelligence 
methods. The proposed approach does not produce an exact solution in pure mathe- 
matical terms but appears to be quite adapted to give a real support to decision mak- 
ing, by providing, through a comprehensive process, an improved approximation of 
the set of non inferior solutions attached to this hi-criterion decision problem. 
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Abstract. A halftoning technique that uses a simple GA has proven 
to be very effective to generate high quality halftone images. Recently, 
the two major drawbacks of this conventional halftoning technique 
with GAs, i.e. it uses a substantial amount of computer memory 
and processing time, have been overcome by using an improved GA 
(GA-SRM) that applies genetic operators in parallel putting them in a 
cooperative-competitive stand with each other. The halftoning problem 
is a true multiobjective optimization problem. However, so far, the 
GA based halftoning techniques have treated the problem as a single 
objective optimization problem. In this work, the improved GA-SRM 
is extended to a multiobjective optimization GA to generate simulta- 
neously halftone images with various combinations of gray level and 
spatial resolution. Simulation results verify that the proposed scheme 
can effectively generate several high quality images simultaneously in a 
single run reducing even further the overall processing time. 

Keywords: multiobjective genetic algorithm, multiobjective optimiza- 
tion, halftoning problem, cooperative-competitive genetic operators. 



1 Introduction 

The multiobjective nature of most real-world problems makes multiobjective 
optimization (MO) a very important research topic. Evolutionary algorithms 
(EAs) seem particularly desirable to solve MO problems because they evolve 
simultaneously a population of potential solutions to the problem in hand, which 
allows to search for a set of Pareto optimal solutions concurrently in a single run 
of the algorithm. Many authors have been increasingly investigating MO using 
EAs in recent years and the number of applications has been rapidly growing P] 
I213llj . In the signal processing area, application methods using EAs, especially 
genetic algorithms (GAs), are also steadily being developedjS|. 

In this work, we especially focus on the image halftoning technique using GAs. 
Kobayashi et al. PD use a GA to generate bi-level halftone images with quality 
higher than conventional techniques such as ordered dithering, error diffusion 
and so on^. However, it uses a substantial amount of computer memory and 
processing time 03 Recently, Aguirre et al . [Ill 1 )j have proposed an improved 
GA (GA-SRM) to overcome these two drawbacks of the conventional halftoning 
technique with GAs. GA-SRM is based on an empirical model of GA that applies 
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genetic operators in parallel putting them in a cooperative-competitive stand 
with each other |TTl1 211 Mil 4] . The improved GA-SRM, extended to the halftoning 
problem, can generate high quality images achieving a 98% reduction in the 
population size and an 85%-70% reduction in processing time. 

The halftoning problem is a true MO problem in which high gray level and 
high spatial resolution must be sought to achieve high quality images. The GA 
based halftoning techniques mentioned above, however, treat the problem as a 
single objective optimization problem and can generate only one image at a time. 

In this work, the improved GA-SbM [9f1 0] is extended to a multiobjective 
optimization GA to generate simultaneously halftone images with various com- 
binations of gray level and spatial resolution. The simulations results show that 
the proposed scheme can effectively generate several images in a single run re- 
ducing even further the overall processing time. 



2 Halftoning Problem with GAs 

Digital halftoning, a key component of an image display preprocessor, is the 
method that creates the illusion of continuous tone pictures on printing and 
displaying devices that are capable of producing only binary picture elements. 
The fast growing computer and information industry requires each time higher 
image quality and demands higher resolution devices. The halftoning algorithms 
capable of delivering the appropriate image quality for such devices are also 
needed. 

Kobayashi et al. m use a GA to generates bi-level halftone images with 
quality higher than traditional techniques such as ordered dithering, error dif- 
fusion and so on0. An input gray tone image of R gray levels is divided into 
non-overlapping blocks of n x n pixels, and then the 2-dimensional optimum 
binary pattern for each image block is searched using a G A [tip?] . The GA uses 
a n X n 2-dimensional binary representation for the individuals. Grossover in- 
terchanges either sets of adjacent rows or columns between two individuals and 
mutation inverts bits with a very small probability per bit after crossover sim- 
ilar to canonical GA [l5llfij . Individuals are evaluated for two factors required 
to obtain visually high quality halftone images, (i) One is high gray level res- 
olution (local mean gray levels close to the original image), and (ii) the other 
is high spatial resolution (appropriate contrast near edges 1 1^17] . The gray level 
resolutions error is calculated by 

Em{xf^) = \ I -PbO'A) I (1) 

{j,k)Gblock 

where xf’'^ is i-th individual at t-th generation, p{j, k) is the gray level of the 
(j, k)-th pixel in the original image block, and Pb{j, k) is the estimated gray 
level associated to the (j, fc)-th pixel from the generated binary block. To obtain 
Pb{j,k), a reference region around the (j, fc)-th binary pixel (for example 5x5 
pixels) is convoluted by a gaussian filter that models the correlation among 
pixels. On the other hand, the spatial resolution error is calculated by 
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= A E I ^)) - k)-hR\ (2) 

{j,k)^block 

where ps{j,k) is the local mean gray level around the (j, fc)-th pixel (within a 
reference region) in the original image block, and g(_), k) is the binary level of 
the {j, fc)-th pixel in the generated image block. These two errors are combined 
into one single objective function as 

= UmEmixf'’) + U}cEc{xf^) ( 3 ) 

where u>rn and Uc are the weighting parameters for gray level and spatial reso- 
lution errors, respectively. The individuals’ fitness is assigned by 

!{xf^) = e{x^^) - e{xf^) (4) 

where e{x\^) is the combined error of the worst individual at t-th generation. The 
high image quality that can be achieved is the method’s major strength. However, 
it uses a substantial amount of computer memory and processing time. High 
quality, visually satisfactory, halftone images are obtained with 200 individuals 
and 200 generations (totally 40,000 evaluations) per image blockjSd- 

Recently, Aguirre et a,l . |DI 1 1 )j have proposed an improved GA (GA-SRM) to 
overcome these two drawbacks of the conventional halftoning technique with 
GAs. GA-SRM is based on an empirical model of GA that applies genetic op- 
erators in parallel putting them in a cooperative-competitive stand with each 
other jl lll2ll,Trpq . GA-SRM is applied to the halftoning image problem using 
genetic operators properly modified for this kind of problem ( see I4.3B . GA-SRM 
with parallel adaptive dynamic block (ADB) mutation impressively reduces pro- 
cessing time and computer memory to generate high quality images. For example, 
GA-SRM with qualitative ADB using a 2 parent 4 offspring configuration needs 
about 6,000-12,000 evaluations per image block, depending on the input image, 
to obtain results similar to those achieved by the conventional image halftoning 
technique using GAs. These data represent a 98% reduction in the population 
size and an 85%-70% reduction in processing time. 

3 Multiobjective Optimization (MO) 

MO methods deal with finding optimal solutions to problems having multiple 
objectives. Let us consider, without loss of generality, a minimization multiob- 
jective problem with M objectives: 

minimize g{x) = (gi{x), ■ ■ ■ , gM{x)) (5) 

where a; S X is a solution vector in the solution space X, and gi{-), - ■ ■ ,gM{‘) 
the M objectives to be minimized. Key concepts used in determining a set of 
solutions for multiobjective problems are dominance, Pareto optimality, Pareto 
set, and Pareto front. These concepts can be defined as follows. 

A solution vector y G X is said to dominate a solution vector z G X, 
denoted by q(y) A q(z), if and only if y is partially less than 2 ;, i.e., Vf G 
gjiy) <gj{z)A3j G : gj{y) < gj{z). 
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A solution vector a; G is said to be Pareto optimal with respect to X if it 
is not dominated by any other solution vector, i.e., ->3 x' G X : g{x') A g{x). 
The presence of multiple objectives, usually conflicting among them, gives rise 
to a set of optimal solutions. The Pareto optimal set is defined as: 



P = {x G x' G X : g{x') ^ g{x)} (6) 

and the Pareto front is defined as: 

PF = {g{x) = (gi(a;),- • ■ ,gM{x)) \x G P} (7) 

The multiobjective nature of most real-world problems makes MO a very 
important research topic. The presence of various objectives, however, implies 
trade-off solutions and makes these problems complex and difficult to solve. EAs 
seem particularly desirable to solve MO problems because they evolve simulta- 
neously a population of potential solutions to the problem in hand, which allows 
to search for a set of Pareto optimal solutions concurrently in a single run of the 
algorithm. 

Many authors have been increasingly investigating MO using EAs (MOEA) 
and the number of applications has been rapidly growing. The list of contribu- 
tors to the held is extensive and comprehensive reviews can be found in |ll2ldl4| . 
Fonseca and Fleming^ and Horn|2| examine major MOEA techniques, Coello 
0 presented a MOEA review classifying implementations from a detailed algo- 
rithmic standpoint, discussing the strengths and weaknesses of each technique. 
Recently, Van Veldhuizen and Lamontp] expand upon these reviews. 



4 GA-SRM Extension to MO 

4.1 Concept of GA-SRM 

We have presented an empirical model of GA that puts parallel genetic operators 
in a cooperative-competitive stand with each other pursuing better balances for 
crossover and mutation over the course of a run The main features 

of the model are (i) two genetic operators with complementary roles applied in 
parallel to create offspring: Self-Reproduction with Mutation (SRM) that put 
emphasis on mutation, and Crossover and Mutation (CM) that put emphasis 
on recombination (ii) an extinctive selection mechanism, and (iii) an adaptive 
mutation schedule that varies SRM’s mutation rates from high to low values 
based on SRM’s own contribution to the population. 

The parallel formulation of genetic operators allows the combination of cross- 
over with high mutation rates avoiding operators’ interferences, i.e. beneficial 
recombinations produced by crossover are not lost due to the high disruption 
introduced by parallel mutation and similarly the survivability of beneficial mu- 
tations are not affected by ineffective crossing over operations. The parallel appli- 
cation of genetic operators implicitly increases the levels of cooperation between 
them to introduce and propagate beneficial mutations. It also sets the stage for 
competition between operators’ offspring. 
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Although the parallel formulation of genetic operators can avoid interferences 
between operators, it does not prevent SRM from creating deleterious mutations 
or CM from producing ineffective crossing over operations. To cope with these 
cases we also incorporate in the model the concept of extinctive selection that 
has been widely used in Evolutionary Strategies^7|- Through extinctive selec- 
tion the offspring created by CM and SRM coexist competing for survival and 
reproduction as well. The poor performing individuals created by CM and SRM 
are eliminated. The parallel formulation of genetic operators tied to extinctive 
selection creates a cooperative-competitive environment for the offspring created 
by CM and SRM. GA-SRM based on this model remarkably improves the search 
performance of CA [ll)fl 411 S) . 



4.2 Multiobjective GA-SRM for Halftoning Problem 



To extend GA-SRM to MO for halftoning image generation we follow a coop- 
erative population search with aggregation selection [2l 1 1121 )I2 1 12‘J\ . The popu- 
lation is monitored for non-dominated solutions; however, Pareto based fitness 
assignment P323I is not directly used. A predetermined set of weights W, which 
ponder the multiple objectives, defines the directions that the algorithm will 
search simultaneously in the combined space of the multiple objectives. W is 
specified by 



W = ■ ■ ■ , 



( 8 ) 



where N indicates the number of search directions. The k-th search direction 
is a vector of nonnegative weights specified by 



u;'= = (a;^. 






( 9 ) 



where M indicates the number of objectives and its components satisfy the 
following conditions 

= ( 10 ) 



M 



H"! = 1 



(11) 



We evaluate individuals for the same two factors indicated in Q1 (number of 
objectives M = 2): (i) high gray level resolution and, (ii) high spatial resolution. 
Here we use the same evaluation functions Em and Ec, respectively, proposed 
in m to calculate objective values and assign its normalized values to each 
individual as indicated by 






100 X {Em{xf) - Ar") 

fPmax Tpmin 



( 12 ) 






100 X (Ae(a;f^) - A™”) 

Tpmax TPmin 



(13) 



where E™™, and E™®” are maximum and minimum values for Em 

and Ec, respectively, obtained experimentally using various test images. 
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The objective values are calculated once for each individual in the offspring 
population. However, we keep as many fitness values as defined search directions. 
A combined objective value is calculated for each (A: = 1, 2, • • • , N) by 



M 



9’"{Xi) = = uj^gi{xX’) + uj292{xX’) 

i=i 



(14) 



and the individuals’ fitness in the fc-th search direction is assigned by 

f\xf)=g\x^^)-g\xf^) (15) 

where g^{x^) is the combined objective value of the worst individual in the fc-th 
search direction at the t-th generation. 

For each search direction uj^, CM creates a corresponding Xqm number of 
offspring. Similarly, SRM creates ^SRM offspring (see detailed information about 
CM and SRM implementation for halftoning problem in I4.:il) . Thus, the total 
offspring number for each search direction is 



\k \k I \ k 

^ — '^CM '^SRM’ 



(16) 



The offspring created for all N search directions coexist within one single off- 
spring population. Hence the overall offspring number is 

N 

A = ^A^ (17) 

fe=i 



SRM’s mutation rates are adapted based on a normalized mutants survival 
ratio. The normalized mutant survival ratio used in ITO is extended to 



7 = 



N 

t^SRM 

k=l 

N 

^SRM 

k=l 



A 



Ex" 



(18) 



where is the number of individuals in the parent population of the k-th search 
direction is the number of individuals created by SRM present in 

P^{t) after extinctive selection, i® the offspring number created by SRM 

and A is the overall offspring number as indicated in Eq. (ED. 

We chose (/r. A) Proportional Selection^ to implement the extinctive selec- 
tion mechanism. Since we want to search simultaneously in various directions, 
selection to choose the parent individuals that will reproduce either with CM or 
SRM is accordingly applied for each one of the predetermined search directions. 
Thus, selection probabilities for each search direction are computed by 

f\xf) /Y^f\xf) (1<*<m"<A'=) 

<i< X) 



Psi4^) = 



0 



(19) 
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Fig. 1. Block diagram of the extended multiobjective GA-SRM 

where is an individual at generation t which has the i-th highest fitness 
value in the fc-th search direction is the number of parents and is 

the number of offspring in the k-th search direction, and A is the overall number 
of offspring. 

Note that for each search direction only A^ < A individuals are created. 
However, the parent population is chosen among the overall A offspring pop- 
ulation. In this way information sharing is encourage among individuals created 
for neighboring search directions provided that the neighbors’ fitness are com- 
petitive with the locals’. Fig. [□ presents the block diagram of the extended 
multiobjective GA-SRM. 

Once the offspring has been evaluated, a set of non-dominated solutions is 
sought for each search direction, i.e. for the k-th search direction non-domination 
is checked only among the offspring created for that search direction. Two sec- 
ondary populations keep the non-dominated solutions. Pcur(t) keeps the non- 
dominated solution obtained from the offspring population at generation t and 
Pnds keeps the set of the non-dominated solutions found through the genera- 
tions. Pnds is updated at each generation with Pcurif)- In the halftoning prob- 
lem an image is divided into blocks and the GA is applied to each image block. 
Hence, the GA would generate a set of non-dominated solutions for each image 
block. Since we are interested in generating simultaneously various Pareto opti- 
mal “whole” images, a decision making process is integrated to chose only one 
solution for each search direction in each image block. Thus, among the various 
non-dominated solutions found for a given search direction, we chose the one 
that minimizes the combined error Em and Ec in that particular direction. 

4.3 CM and SRM for Halftoning Problem 

In the halftoning problem an individual is represented as a n x n two- 
dimensional structure. In this work we use the same two-dimensional operators, 
GM (Grossover and Mutation) and SRM-ADB (Self Reproduction with Mutation 
- Adaptive Dynamic Block), presented in [Dl 1 1 1) to create offspring. 
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CM first crosses over two previously selected parents interchanging either 
their rows or columns, similar to m, and then it applies standard mutation 
inverting bits with a small mutation probability per bit, pm^\ analogous to 
canonical GAs. Thus, mutation in CM is of a quantitative nature after which 
the number of Os and Is may change. It may be worth trying more specialized 
approaches to implementing crossover, however this point will not be discussed 
in this work. 

SRM, on the other hand, first creates an exact copy of a previously selected 
individual from the parent population and then applies mutation only to the 
bits inside a mutation block. SRM is provided with an Adaptive Dynamic- 
Block (ADB) mutation schedule similar to Adaptive Dynamic-Segment mutation 
(ADS)H2|II). With ADB mutation is directed only to a block (square region) 
of the chromosome and the mutation block area i x i is dynamically adjusted 
to £/2 X t'/2 every time the normalized mutants survival ratio 7 by Eq. (11 
falls under a threshold r. The block’s side length £ varies from n to 2, [n, 2]. 
The offset position of the mutation block is chosen at random for each chro- 
mosome. The adaptive mechanism in SRM is designed to control the required 
exploration-exploitation balance during the search process. 

The effect of ADB’s mutation on the distribution of Os and Is within an 
individual could be of a qualitative or quantitative nature. It has been verified 
in [HI 1 1 )j that for the halftoning problem ADB with qualitative mutation shows 
superior performance than ADB with quantitative mutation (i.e. bit flipping 
mutation). Since qualitative mutation do not change the number of Os and Is 
within an individual it has an impact only on the spatial resolution error Ef., 
while quantitative mutation has an impact on both Em and Ec in Eq. and 
dmi. Thus, qualitative mutation is less disruptive and can take better advantage 
of the high correlation among contiguous pixels in an image PHI contributing to 
a more effective search. Therefore, in this work we use ADB with qualitative 
mutation, which is implemented as a bit swapping process. Note that there is 
no need to set a mutation probability in qualitative mutation since all pairs of 
bits within the mutation block are simply swapped. 

5 Experimental Results and Discussion 

We observe and compare the performance of four kinds of GAs generating 
halftone images: (i) a simple GA that uses GM and proportional selection, similar 
to [tilt) , (denoted as cGA) (ii) an extended cGA using the same multiobjective 
technique described in 03 (denoted as a moGA), (iii) a GA with SRM that uses 
GM, SRM and (/r. A) proportional selection [Hill )j (denoted as GA-SRM), and 
(iv) the extended multiobjective GA-SRM (denoted as moGA-SRM). 

The GAs are applied to SIDBA’s benchmark images in our simulation. The 
size of the original image is 256 x 256 pixels with R = 256 gray levels. An image 
is divided into 256 non-overlapping blocks, each one of size nxn = 16 x 16 pixels. 
For each block, the algorithms were set with different seeds for the random initial 
population. 
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We define 11 search directions, N = 11, setting W = {u>^ , , ■ ■ ■ , uj^^} = 

{(0.0, 1.0), (0.1, 0.9), •••, (1.0, 0.0)}. With = (0.0, 1.0) the search focuses ex- 
clusively in space and with — (1.0, 0.0) in E^s; whereas with 
2 < /c < 10, the search focuses in the combined space of Ec and Em- moGA and 
moGA-SRM generate simultaneously one image for each direction in a single 
run. On the other hand, to generate the 11 images with either cGA or GA-SRM 
an equal number of separate runs are carried out, each one using a different 
as weighting parameter. Unless stated otherwise, the GAs are set with the pa- 
rameters detailed in Table 00 and the experimental image used is “Lenna” . The 
values set for crossover and mutation probabilities in cGA are the same used in 
m- The image quality attained by the cGA with a 200 parent population and 
the same T = 4 x 10^ evaluations used in m are taken as a reference for com- 
parison in our study. The number of generations performed for each algorithm 
is calculated as T/A. 

Table 1. Genetic algorithms parameters 



Parameter 


cGA 


moGA 


GA-SRM 


mo GA-SRM 


Selection 


Proport. 


Proport. 


{fj., A) Proport. 


(/r. A) Proport. 


Mating 


{xi,Xj), id 3 


{xi,Xj), id 3 


(x^,Xj), i d 3 


{xi,Xj), id 3 


Pc 


0.6 


0.6 


1.0 


1.0 


ACM) 

Pm 


0.001 


0.001 


0.001 


0.001 


/r'= ; A'' 


- 


1 : 1 


1 : 2 


1 : 2 


AcM • ^SRM 


- 


- 


1 : 1 


1 : 1 


T 


- 


- 


0.40 


0.40 



Table El shows the average in all image blocks of the non-normalized com- 
bined errors e^{x) = uj^Em{x) + uj^^dx) by cGA(200) after T evaluations for 
each search direction 1 < fc < 11, under column W. For the other algorithms 
under W we present the fraction of T at which the algorithm reach similar im- 
age quality (for cGA(200) these values are all 1.00 and are shown right below 
the combined error). Golumn indicates the overall evaluations needed to 
generate the 11 images. Since the cGA generates one image at a time, it needs 
117|) evaluations to generate all 11 images. The first moGA row show results 
by the multiobjective simple GA with a = 18 parents and a = 18, A = 198 
offspring configuration. moGA simultaneously generates the 11 images and needs 
approximately 2. 4370 to guarantee that all images would have at least the same 
quality as cGA(200). moGA’s second row show results by moGA with a = 4 
parents and a A^ = 4, A = 44 offspring configuration. In this case population size 
reduction in moGA accelerates a little bit more the overall convergence and still 
produces better images than cGA(200). It should be noticed that population 

GA-SRM search only in one direction at a time and the population related parame- 
ters A*^, XcMy 8-nd should be read without the index k 

^^The entire number of evaluations required by the single objective GAs to generate 
all 11 images are given by the sum of the evaluations expended in each direction 
^Rn the case of multiple objective GAs, due to the concurrent search, the maximum 
number of the evaluations among all search directions determines the overall number 
of evaluations needed to generate all 11 images 
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reductions in cGA accelerates convergence but it is affected by a lost of diversity 
and the final image quality is inferior than cGA(200)’s 03 - moGA benefits from 
the information sharing induced by selection (see explanation below for Fig. 
1^ and can tolerate population reductions. Gompared with cGA, the results by 
moGA represents an enormous reduction in processing time and illustrates the 
benefits that can be achieved by including multiobjective techniques within GAs. 



Table 2. Evaluations to generate high quality images (Lenna) 





W = 






■ ■ ,o; 










rpW 


Algorithm 






a;"* 


















combined error 


121.0 


111.4 


100.6 


89.5 


78.2 


66.9 


55.5 


44.2 


32.8 


21.5 


10.1 


- 


cGA(200) 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


iijO 


moGA{18, 198) 


1.43 


2.43 


1.65 


1.27 


1.21 


1.00 


0.86 


0.76 


0.70 


0.65 


0.72 


CO 


moGA{4, 44) 


1.12 


2.30 


1.44 


1.36 


1.20 


1.02 


0.85 


0.79 


0.73 


0.66 


0.79 


2.307*^ 


GA-SRM {2, 4) 


0.40 


0.23 


0.15 


0.13 


0.12 


0.11 


0.10 


0,09 


0.09 


0.08 


0.08 


1.587121 


moGA-SRM{9, 198) 


1.12 


1.07 


0.58 


0.44 


0.30 


0.27 


0.24 


0.23 


0.22 


0.21 


0.21 


1 I 27 EI 


moGA-SRM{2, 44) 


1.56 


1.03 


0.50 


0.30 


0.20 


0.16 


0.15 


0.13 


0.12 


0.12 


0.12 


1.56t1^ 


moGA-SRM*{2,44) 


0.96 


0.92 


0.40 


0.31 


0.22 


0.17 


0.15 


0.14 


0.13 


0.13 


0.13 


o.96t121 



Row GA-SRM(2,4) presents results by GA-SRM with a 2 parents and 4 
offspring configuration. GA-SRM even with a very scaled down population con- 
figuration considerably reduces processing time to generate high quality images 
for all combinations of weighting parameters. GA-SRM, for this particular im- 
age, would need approximately I.BS'M to generate all 11 images. Note that 
GA-SRM sequentially generating the 11 images is faster than moGA. 

The first moGA-SRM row show results by the multiobjective proposed GA- 
SRM with a /i^ = 9 parents and a A^' = 18, A = 198 offspring configuration. 
Gompared with moGA we can see that the inclusion of SRM notoriously increases 
the multiobjective algorithm’s performance needing no more than 1. to 

generate the 11 images, which is faster than GA-SRM. Results by a scaled down 
population configuration is shown in row moGA-SRM(2,44) that represents a 

= 2 parents and a A^ = 4, A = 44 offspring configuration. The population size 
reduction in moGA-SRM accelerates convergence in all but one search direction 
(see under and the overall evaluation time is similar to GA-SRM. From GA- 
SRM and moGA-SRM results we see that parallel mutation SRM can greatly 
improve the performance of single objective as well as multiobjective genetic 
algorithms in the halftoning problem. 

We observe that moGA(2,44), which uses GM but not SRM, only for 
produces faster convergence than moGA-SRM (e^ = O.OEm + l.OEc). It seems 
that GM alone is particularly useful for searching in Ec’s search space. However, 
when the search involves both Em’s and Ec’s spaces the interaction of GM and 
SRM produce better results. We conduct an experiment in which we favor GM’s 
offspring over SRM’s only in the direction. In row moGA-SRM*(2,44) we 
show results using a configuration that creates offspring in direction only 
with GM, i.e. = 4, Xsrm = ^ and X^m = 2, A|^^ = 2 for 2 < fc < 11. This 
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has the effect of accelerating convergence in search direction and therefore 
reducing the overall evaluation time to 0.96T. 

Em and Ec represent fitness landscapes with different degree of difficulty for 
the GAs. Em’s landscape is smoother than EcS and the GAs are expected to 
converge faster in direction. This is corroborated by the results obtained 

by the GAs. In Table we can see that for with k > 6, Em’s directions, 
the algorithms need less time to converge. It should be specially noticed that 
moGA-SRM for those directions finds high quality images in less than 0.2T. This 
behavior and the results by the last experiment mentioned above suggest that 
it may be worth trying dynamic configurations so that more resources could be 
assigned to those directions that require more time to converge accelerating the 
overall time needed to generate images simultaneously. 



Table 3. Actual percentage of evaluations expended in each search direction 



Algorithm 
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cGA(200) 


100.0 


100.0 


100.0 


100.0 


100.0 


100.0 


100.0 


100.0 


100.0 


100.0 


100.0 


moGA(18, 198) 


13.0 


22.1 


15.0 


11.5 


11.0 


9.1 


7.8 


6.9 


6.4 


5.9 


6.5 


moGA(4, 44) 


10.2 


20.9 


13.1 


12.4 


10.9 


9.3 


7.7 


7.2 


6.6 


6.0 


7.2 


GA-SRM{2, 4) 


40.0 


23.0 


15.0 


13.0 


12.0 


11.0 


10.0 


9.0 


9.0 


8.0 


8.0 


moGA - SRM{9, 198) 


10.2 


9.7 


5.3 


4.0 


2.7 


2.5 


2.2 


2.1 


2.0 


1.9 


1.9 


moGA - SRM{2, 44) 


14.2 


9.4 


4.5 


2.7 


1.8 


1.5 


1.4 


1.2 


1.1 


1.1 


1.1 


moGA- SRM*{2,U) 


8.7 


8.4 


3.6 


2.8 


2.0 


1.5 


1.4 


1.3 


1.2 


1.2 


1.2 



In Table 121 moGA’s and moGA-SRM’s rows show the evaluations expended 
by the algorithm in all search directions. The actual percentage of the evalua- 
tions expended in each search direction is shown in Table |21 From this table it 
can be seen that with the multiobjective algorithms there is a substantial reduc- 
tion of the actual number evaluations for each search direction. These reductions 
are explained by the information sharing induced by the selection process. As 
mentioned in HU and indicated by Eq. m, the individuals with higher fitness 
in a specific direction are selected as parents. Thus, the individuals chosen to be 
parents for the fc-th search direction at generation t may have been created for 
neighboring directions at generation t-1. To verify this point we also observe the 
composition of the parent population for each search direction. Fig. Elshows the 
average distribution for some of the directions after O.IT and T evaluations, 
respectively. For example, in Fig. El(a) , the parent population of is in average 
composed by 18% of individuals coming from 30% from itself, and 13% 
from From these figures we can see that each search direction benefits from 
individuals that initially were meant for other neighboring directions. This infor- 
mation sharing pushes forward the search reducing convergence times. Looking 
at Fig. Elja) and Fig. EKb) we can see that the information sharing is higher 
during the initial stages of the search. 

Fig. 0 illustrates typical transitions of the non-normalized combined error 
e{x) over the number of evaluations for some of the search directions by the 
GAs. The plots are cut after T evaluations. From these figures it can be visually 
appreciated the higher convergence velocity and higher convergence reliability 
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Fig. 2. moGA-SRM’s average parent population distribution 

(lower errors) by the algorithms that include SRM, GA-SRM and moGA-SRM. 
In general, moGA is faster than the cGA, but their final image quality tends to 
be the same. Also, it should be noticed that results by moGA and moGA-SRM 
are achieved simultaneously in one run (thus, T for these algorithms indicates 
the evaluations expended in all search directions). 




Fig. 3. Error transition for various 



Fig. 0show the original image “Lenna” and the images generated by two con- 
ventional halftoning techniques: ordered dithering (screen) and error diffusion|H|. 
Fig. Elshow some of the simultaneously generated images by moGA-SRM. From 
these figures we can see that moGA-SRM generates more pleasant images to 
the human observer than traditional techniques. Another point to be remarked 
is that traditional halftoning techniques can generate only one image. On the 
other hand, among the images generated by moGA-SRM there is a gradual dif- 
ference according to spatial and gray level resolution, which makes the GA based 







Fig. 5. Lenna’s simultaneously generated images by moGA-SRM*(2,44) after 0.96T 

halftoning technique more flexible to users’ requirements as well as more robust 
to constraints imposed by displaying and printing devices. 

With regards to processing time, running software implementations of the 
algorithms in a Pentium III processor (600 MHz), to generate one image con- 
ventional techniques need only few seconds while GA-SRM (also implemented in 
software) needs about 8 minutes. Note that GA based techniques in this study 
process one block at a time always starting with random initial populations. 
Due to the high correlation among neighbor blocks of an image, reductions on 
processing time are expected by using previously generated image blocks in the 
initial populations of the subsequent blocks. However it is clear that, from a 
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processing time standpoint, in order to apply GA based halftoning techniques 
on-line they must be improved further to reduce as much as possible the number 
of evaluations needed to generate higher quality images. Also, the GA’s final 
implementation for industrial application must be in hardware. 

Finally, we should also say that similar results were obtained for other 
SIBDA’s benchmark images. 

6 Conclusions 

In this work we have extended an improved GA (GA-SRM) to a multiobjective 
optimization GA (moGA-SRM) for the image halftoning problem aiming to si- 
multaneously generate halftone images with various combinations of gray level 
and spatial resolution. 

GA-SRM is based on an empirical model of GA that puts parallel genetic op- 
erators in a cooperative-competitive stand with each other. To extend GA-SRM 
we follow a cooperative population search with aggregation selection preserving 
the fundamental features of the cooperative-competitive model. We compare the 
performance of four genetic algorithms generating halftone images: (i) a single 
objective simple GA (cGA), (ii) a single objective GA-SRM, (iii) a multiobjective 
simple GA (moGA), (iv) the proposed multiobjective GA-SRM (moGA-SRM). 

From our experimental results we observe that multiobjective techniques ben- 
efit from information sharing and can greatly reduce processing time to generate 
simultaneously high quality images. To generate 11 images moGA requires only 
about 21% of the evaluations used by cGA. The cooperative-competitive model 
for parallel operators helps to increase the performance of single and multi ob- 
jective GAs in this problem reducing even further processing time. GA-SRM 
requires about 15% and moGA-SRM about 9% of the evaluations used by cGA. 

As future works, important issues to be explored related to the halftoning 
problem are (i) the effect of the definition of the weights set on the algorithm’s 
stability and convergence, (ii) dynamic and parallel hierarchical configurations 
for moGA-SRM in order to accelerate the overall time needed to generate im- 
ages simultaneously. Also, we are planning to continue studying moGA-SRM’s 
behavior in a wider range of problems that include more than two objectives 
and use it in other real world applications. 
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Abstract. We implement multiobjective evolutionary algorithms for 
the optimization of micro-fluidic devices. In this work we discuss the 
development of multimembered evolution strategies with step size adap- 
tation in conjunction with the Strength Pareto Approach. In order to 
support targeting, an extension of the Strength Pareto Evolutionary Al- 
gorithm is proposed. The results suggest a novel design for micro-fluidic 
devices used for DNA sequencing. 



1 Introduction 

Evolutionary Algorithms (EAs) such as Evolution Strategies or Genetic Algo- 
rithms have become the method of choice for optimization problems that are 
too complex to be solved using deterministic techniques such as linear program- 
ming or gradient (Jacobian) methods. The enormous number of applications 
( [Beasley (1997)| ) and the still growing interest in this field are due to several 
advantages of EAs compared to gradient based methods for complex problems. 
EAs require little knowledge about the problem being solved, they are easy 
to implement, robust and inherently parallel. To solve a certain optimization 
problem, it is enough to require that one is able to evaluate the objective (cost) 
function for a given set of input parameters. The property of parallelism becomes 
more and more important with the increasing power and availability of large 
parallel computer systems. Because of their universality, ease of implementation 
and fitness for parallel computing, EAs often take less time to find the optimal 
solution than gradient methods. However, most real-world problems involve 
simultaneous optimization of several, often mutually concurrent objectives. 
Multiobjective EAs are able to find optimal trade-offs in order to get a set of 
solutions that are optimal in an overall sense. In multiobjective optimization, 
gradient based methods are often impossible to apply. Multiobjective EAs 
however can always be applied and they inherit all of the favorable properties 
from their single objective relatives. 



Section|2|of this paper introduces main concepts of single objective EAs. Sec- 
tion 0 extends these ideas to multiobjective cases and introduces the principles 
of dominance and Pareto optimality. Section 0) describes the Strength Pareto 
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Approach used in this work and in sectional we extend it with a targeting capa- 
bility. In sectional the results of both single and multiobjective optimization of 
a microchannel flow are shown and discussed. 

2 Single Objective Evolutionary Algorithms 

The basic idea for single objective EAs is to imitate the natural process of 
biological evolution. The problem to be solved is therefore described using a 
certain number of parameters (design variables). One then creates a group of 
A(> 0) different parameter vectors and considers it as a population of individuals. 
The quantity A is called the population size. The quality of a certain vector of 
parameters (i.e. an individual in the population) is expressed in terms of a scalar 
valued fitness function (objective function). Depending on whether one wants to 
minimize or maximize the objective function, individuals (i.e. parameter vectors) 
with lower or greater fitness are considered better, respectively. The algorithm 
then proceeds to choose the /r, (^ < A) best individuals out of the population 
to become the parents of the next generation (natural selection, survival of the 
Attest). Therefore, pL denotes the number of parents. The smaller p is chosen 
compared to A, the higher the selection pressure will be. Out of the p individuals 
chosen to be parents for the next generation, one then creates a new population 
of A offspring by applying mutation on the parents x® as follows: 

= X® -bA/'(0, A) ,1 = 1,. .. , A , j e {1, . . . ,/r} (1) 

where Af(0, A) denotes a vector of jointly distributed Gaussian random num- 
bers with zero mean and covariance matrix A. The standard deviations (i.e. the 
square roots of the diagonal elements tr? of A) of the additive random num- 
bers determine “how far away from its parent a child will be” and are called 
step sizes of the mutation. Now, the first iteration is completed and the al- 
gorithm loops back to the evaluation of the fitness function for the new in- 
dividuals. Several different techniques for adaptation and control of the step 
size have been developed (see e.g. Pacl^_^199^ H, [Back (1997b )| , pack (1993)| , 
Pansen fc Ostermeier (1996)| or Pansen fc Ostermeier (1997)| ). In the follow- 
ing subsections, some of the single objective Evolution Strategies used in this 
work are outlined. 

2.1 The (l-l-l)-ES 

One of the simplest and yet powerful evolution strategies is the “one plus one 
evolution strategy”, denoted by (H-l)-ES. In this strategy, both the number 
of parents and the population size (i.e. number of offspring) are set to one: 
p = X = 1. Mutation is accomplished by adding a vector of usually uncorrelated 
Gaussian random numbers, i.e. A = diag{af) is a diagonal matrix. Step size 
adaptation can be performed according to Rechenberg’s 1/5-rule: if less than 
20% of the generations are successful (i.e. offspring better than parent), then 
decrease the step size for the next generation; if more than 20% are successful. 
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then increase the step size in order to accelerate convergence. This adaptation 
is done every N ■ Ln generations where N is the number of parameters (i.e. 
dimension of search space) and Lpi is a constant, usually equal to one. Selection 
is done out of the set union of parent and offspring, i.e. the better one of the 
two is chosen to become the parent of the next generation. 

2.2 The (/X, A)-ES 

A slightly more advanced method is to take one or more parents and even 
more offspring, i.e. /x > 1 and X > fi. Mutation is accomplished in a sihyper- 
ellipsoidmilar way as with the (1+1)-ES. Besides the 1/5 rule, another method 
for step size adaptation becomes available which is called self-adaptive muta- 
tion ( |Back (1997a)| ). In this method, the mutation steps are adapted every 
generation. They are either increased, decreased or kept the same, each with a 
probability of 1/3. On the average, 1/3 of the offspring will now be closer to their 
parents than before, 1/3 keeps progressing at the same speed and 1/3 explores 
further areas. Depending on how far away from the optimum we currently are, 
one of these three groups will do better than the others and therefore, more 
individuals out of it will be selected to the next generation, where their step 
sizes are inherited. The algorithm adapts the step size by itself, i.e. by means of 
mutation and selection. 



2.3 The (/x//X 7 , A)-CMA-ES 



The Covariance Matrix Adaptation is a sophisticated method for online adap- 
tation of step sizes in (/x, A)-ES with intermediate recombination (i.e. averaging 
of parents) . It was first desc ribed by pansen fc Ostermeier (1996) and further 
improved and evaluated by |Hansen fc Ostermeier (1997)| . For a~complete de- 
scription of the algorithm, the reader is referred to the latter publication. The 
basic idea is to adapt step sizes and covariances in such a way, that the longest 
axis of the of mutation distribution always aligns in the direction of greatest 
estimated progress. This is done by accumulating information about former mu- 
tation steps and their success {evolution path) and searching it for correlations. 
Besides this very sophisticated method for step size adaptation, a CMA-ES also 
includes mutation (with S now being a full matrix) and selection. 



3 Multiobjective Evolutionary Algorithms 

As soon as there are many (possibly conflicting) objectives to be optimized si- 
multaneously, there is no longer a single optimal solution but rather a whole set 
of possible solutions of equivalent quality. Consider for example the design of an 
automobile. Possible objectives could be: minimize cost, maximize speed, mini- 
mize fuel consumption and maximize luxury. These goals are clearly conflicting 
and therefore there is no single optimum to be found. Multiobjective EAs can 
yield a whole set of potential solutions - which are all optimal in some sense - and 
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give the engineers the option to assess the trade-offs between different designs. 
One then could for example choose to create three different cars according to 
different marketing needs: a slow low-cost model which consumes least fuel, an 
intermediate solution and a luxury sports car where speed is clearly the primer 
objective. Evolutionary Algorithms are well suited to multiobjective optimiza- 
tion problems as they are fundamentally based on biological processes, which 
are inherently multiobjective. 

After the first pioneering work on multiobjective evolutionary optimization in the 
eighties ( ISchaff'er (1984)| , |jchaff'er (1985)| ), several different algorithms have 
been proposed and successfully applied to various problems. For comprehensive 
overviews and discussions, the reader is referred to [Fonseca fc Fleming (1995)] , 
[Horn (1997)1 , [Van Veldhuizen fc Lamont (1998)| and jCoello (1999)} . 

3.1 Dominance and Pareto-Optimality 

In contrast to fully ordered scalar search spaces, multidimensional search spaces 
are only partially ordered, i.e. two different solutions are related to each other in 
two possible ways: either one dominates the other or none of them is dominated. 

Definition 1: Consider without loss of generality the following multiobjec- 
tive optimization problem with m decision variables x (parameters) and n objec- 
tives y: 



Maximize y = f(x) = {fi{xi, . . ■ ■ -Jnixi, ■ ■ .,Xm)) 

where x = (xi, . . . , Xm) G X (2) 

y = ivi, ■■■,yn) e y 

and where x is called decision (parameter) vector, X parameter space, y 
objective vector and Y objective space. A decision vector a. G X is said to 
dominate a decision vector b G A (also written as a b) if and only if: 

Vi G {!,..., n} : /*(a) > /i(b) , . 

A 3j G {1, . . . , n} : /j (a) > /,(b) 

Additionally, we say a covers b (a A b) if and only if a >~ h or f(a) = f(b). 



Based on this convention, we can define nondominated, Pareto -optimal solu- 
tions as follows: 

Definition 2: Let a G A be an arbitrary decision (parameter) vector. 

1. The decision vector a is said to be nondominated regarding a set X' C X if 
and only if there is no vector in X' which dominates a; formally: 



$a' G X' : a' y a 



(4) 
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2. The decision (parameter) vector a is called Pareto-optimal if and only if a 
is nondominated regarding the whole parameter space X. 

If the set X' is not explicitly specified, the whole parameter space X is 
implied. 

Pareto-optimal parameter vectors cannot be improved in any objective with- 
out causing a degradation in at least one of the other objectives. They represent 
in that sense globally optimal solutions. Note that a Pareto-optimal set does not 
necessarily contain all Pareto-optimal solutions in X. The set of objective vec- 
tors f(a'),a' € X' , corresponding to a set of Pareto-optimal parameter vectors 
a' G X' is called “Pareto- optimal front” or “Pareto- front”. 

3.2 Difficulties in Multiobjectve Optimization 

In extending the ideas of single objective EAs to multiobjective cases, two major 
problems must be addressed: 

1. How to accomplish fitness assignment and selection in order to guide the 
search towards the Pareto-optimal set. 

2. How to maintain a diverse population in order to prevent premature conver- 
gence and achieve a well distributed, wide spread trade-off front. 

Note that the objective function itself no longer qualifies as fitness func- 
tion since it is vector valued and fitness has to be a scalar value. Different ap- 
proaches to relate the fitness function to the objective function can be classified 
with regard to the first issue. For further information, the reader is referred to 
pprn ( 1997)1 . The second problem is usually solved by introducing elitism and 
intermediate recombination. Elitism is a way to ensure that good individuals 
do not get lost (by mutation or set reduction), simply by storing them away in 
a external set, which only participates in selection. Intermediate recombination 
on the other hand averages the parameter vectors of two parents in order to 
generate one offspring according to: 

x'j = -k (1 - a)x®2 ,jjij2 G {!,..., 

=x'j-\-Af{0,S) ,i = l,...,A e {!,..., /x} 

Arithmetic recombination is a special case of intermediate recombination 
where a = 0.5. 



4 The Strength Pareto Approach 

For this work, the Strength Pareto Approach for multiobjective optimiza- 
tion has been used. Comparative studies have shown for a large number of 
test cases that, among all major multiobjective EAs, the Strength Pareto 



MicroChannel Optimization Using Multiobjective Evolution Strategies 521 



Evolutionary Algorithm (SPEA) is clearly superior ( |Zitzler fc: Thiele (1999)| 
l^itzler, Thiele fc Deb (2000)| ). It is based on the above mentioned princi- 
ples of Pareto-optimality and dominance. The algorithm as proposed by 
ItZitzler fc Thiele (1999)| was implemented in a restartable, fully parallel code 
as follows: 

Step 1: Generate random initial population P and create the empty external 

set of nondominated individuals Pb 

Step 2: Evaluate objective function for each individual in P in parallel. 

Step 3: Copy nondominated members of P to P' . 

Step 4- Remove solutions within P' which are covered by any other member of 
P'. 

Step 5: If the number of externally stored nondominated solutions exceeds a 
given maximum TV', prune P' by means of clustering. 

Step 6: Calculate the fitness of each individual in P as well as in P' . 

Step 1: Select individuals from P + P' (multiset union), until the mating pool 

is filled. 

Step 8: Adapt step sizes of the members of the mating pool. 

Step 9: Apply recombination and mutation to members of the mating pool in 

order to create a new population P 

Step 10: If maximum number of generations is reached, then stop, else go to 
Step 2. 



4.1 Fitness Assignment 

In Step 6, all individuals in P and P' are assigned a scalar fitness value. This 
is accomplished in the following two-stage process. First, all members of the 
nondominated set P' are ranked. Afterwards, the individuals in the population 
P are assigned their fitness value. 

Step 1: Each solution i G P' is assigned a real value Si G [0, 1), called strength. 
Si is proportional to the number of population members j G P for which 
i'^ j. Let n denote the number of individuals in P that are covered by 
i and assume N to be the size of P. Then Si is defined as: 

The fitness fi of i is equal to its strength: fi = Si G [0, 1). 

Step 2: The fitness of an individual j G P is calculated by summing the strengths 
of all external nondominated solutions i G P' that cover j. Add one to 
this sum to guarantee that members of P' always have better fitness 
than members of P (note that the fitness is to be minimized): 

/* = 1 + Si ,f^ G [1, N) 



( 6 ) 
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4.2 Selection and Step Size Adaptation 

Step 7 requires an algorithm for the selection of individuals into the mating 
pool and Step 8 includes some method for dynamical adaptation of step sizes 
(i.e. mutation variances). For this paper, selection was done using the following 
binary tournament procedure: 

Step 1: Randomly (uniformly distributed random numbers) select two individ- 
uals out of the population P. 

Step 2: Copy the one with the better (i.e. lower for SPEA) fitness value to the 
mating pool. 

Step 3: If the mating pool is full, then stop, else go to Step 1 

Adaptation of the step sizes was done using the self-adaptive mutation 
method (c.f. section 12.811 . Each element of P and P' is assigned an individ- 
ual step size for every parameter, i.e. S = diag(af) is a diagonal matrix for 
each individual. The step sizes of all members of the mating pool are then either 
increased by 50%, cut to half or kept the same, each at a probability of 1/3. 



4.3 Reduction by Clustering 



In Step 5, the number of externally stored nondominated solutions is limited to 
some number N' . This is necessary because otherwise, P' would grow to infinity 
since there always is an infinite number of points along the Pareto-front. More- 
over, one wants to be able to control the number of proposed possible solutions, 
because from a decision maker’s point of view, a few points along the front are 
often enough. A third reason for introducing clustering is the distribution of 
solutions along the Pareto- front. In order to explore as much of the front as pos- 
sible, the nondominated members of P' should be equally distributed along the 
Pareto- front. Without clustering, the fitness assignment method would probably 
be biased towards a certain region of the search space, leading to an unbalanced 
distribution of the solutions. For this work, the average linkage method, a clus- 
tering algorithm which has proven to perform well on Pareto optimization, has 
been chosen. The reader is referred to Morse (1980)| or [Zitzler fc Thiele (1999)| 
for details. 



5 Strength Pareto Approach with Targeting 

Compared to other methods like for example the Energy Minimization Evo- 
lutionary Algorithm (EMEA) (c.f. [Jonathan, Zebulum, Pacheco & Vellasco 
(2000)]), the SPEA has two major advantages: it finds the whole Pareto-front 
and not just a single point on it and it converges faster. The latter is a univer- 
sal advantage, whereas the former is not. There are applications where a target 
value can be specified. One then wants to find the point on the Pareto-front 
which is closest to the user-specified target (in objective space). This eliminates 
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the need to analyze all the points found by SPEA in order to make a decision. 
EMEA offers such a possibility, but it converges slower than SPEA and it is un- 
able to find more than one point per run. Hence we wish to extend SPEA with 
some targeting facility that can be switched on and off depending on whether 
one is looking for a single solution or the whole front, respectively. We added 
this capability to SPEA by the following changes to the algorithm: 

1. Between Step 6 and Step 7 the fitness values of all individuals in P and P' 
are scaled by the distance D of the individual from the target (in objective 
space) to some power q: 

h = n- Dt 

This ensures that enough nondominated members close to the target will be 
found so that the one with minimal distance will appear at higher probability. 
The parameter q determines the sharpness of the concentration around the 
target. 

2. Another external storage Pbest is added, which always contains the individual 
out of P' which is closest to the target. Therefore, between steps 4 and 5, 
the algorithm calculates the distances of all members of P' to the target 
and picks the one with minimal distance into Pbest- At all times, Pbest only 
contains one solution. 

3. At the end of the algorithm, not only the Pareto-front is output, but also the 
solution stored in Pbest- Note that due to clustering and removal in P', the 
solution in Pbest is not necessarily contained in P' . It is therefore an optimal 
solution which otherwise would not have appeared in the output. 

The algorithm has been implemented and tested for convex and nonconvex 
test functions. Figures Q] to 0 show some results for the nonconvex test function 
?2 as proposed in |Zitzler, Thiele fc Deb (2000)| : 

Minimize T2(x) = (/i(xi), /2(x)) 

subject to /2(x) = g{x 2 , - - . g{x 2 , - - -,Xm)) 

where x = {xi, . . . ,Xm) (7) 

fi{xi) = Xi 

g(x 2 , ---, Xm) = 1-f 9 • X)™ 2 - 1) 

Hh^a) = 1 - ih/af 

where m is the dimension of the parameter space and Xi € [0, 1]. The exact 
Pareto-optimal front is given by g(x) = 1. The parameters of the algorithm were 
set as summarized in table Cl 

The chosen target value is slightly off-front. Therefore, the targeting error will 
never be zero. Figure ^shows the final population after 250 generations without 
targeting. The diamonds indicate members of the external nondominated set 
(Pareto-optimal front), whereas members of the regular population are denoted 
by crosses. In figure 0the same run has been repeated with targeting. Figure 0 
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Table 1. Settings for targeting SPEA 



Parameter 


Value 


Dimension of parameter space (m) 


5 


Size of population (A) 


50 


Size of mating pool (/i) 


30 


Size of nondominated set {N') 


70 


Number of generations 


250 


Target value for (/i,/ 2 ) 


(0.5, 0.7) 


Concentration parameter q 


4 



shows the targeting error as a function of the generation number. The dashed 
line indicates the theoretical minimum of the distance. After about 80 to 100 
generations, the point on the front which is closest to the target has been found 
with good accuracy. Figure 0 shows the path of Pbest towards the target. The 
jumps are due to the fact, that the individual stored in Pbest gets replaced as 
soon as another individual is closer to the target. 




Fig. 1. Final population without target- Fig. 2. Final population with targeting 
ing 



The best objective value that was achieved was: {{Pbest) = (0.5265,0.7247), 
its Euclidean distance from the target is 3.6287 • 10“^, which is equal to the 
theoretical minimal distance within the given computational accuracy. 



6 MicroChannel Flow Optimization 

Both single and multiobjective EAs have been applied to a fluidic microchan- 
nel design problem. Bio-analytical applications require long thin channels for 
DNA sequencing by means of electrophoresis. In order to pack a channel of sev- 
eral meters in length onto a small square plate, curved geometries are required. 
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Fig. 3. Distance between Pbest and target Fig. 4. Path of Pbest towards the target 



However, curved channels introduce dispersion and therefore limit the separa- 
tion efficiency of the system. The question is now how to shape the contour of 
the channel in order to minimize dispersion. A detailed description of the prob- 
lem as well as an optimization solution using gradient methods can be found in 
[[Mohammadi et al. ( 2000)1 . 

6.1 Single Objective Optimization 



The goal of this optimization run was to minimize the final skewness of the flow 
inside the channel, i.e. it was required that the iso- values of the advected species 
a be normal to the flow field U by time T when they exit the channel. The 
objective function defined by Mohammadi et al. (2000)] is therefore: 



J = 



(Va(a;,T) x U{x))^ dx 



( 8 ) 



with 17 being the cross section of the channel exit. The shape of the 90 degrees 
turn is described by 11 parameters. Therefore, the parameter search space is of 
dimension 11. The objective space is scalar since it is a single objective problem. 

The calculation of the flow field and evaluation of the objective function was 
done by an external flow solver provided by Mohammadi et al. (2000)] . Both 
a (l-l-l)-ES and a (3/3/,12)-CMA-ES were applied to the problem and their 
convergence was compared. The results were statistically averaged from 5 runs 
with different initial conditions, i.e. starting points. 

Since the CMA-ES has a population size of 12, it performs 12 function eval- 
uations per generation. Figure El shows the convergence normalized to the same 
number of function calls. Figures Eland 0 show the corresponding solutions after 
20 and 180 generations of the best 1-1-1 run out of the ensemble (the lines are 
iso-potential lines of the electric field). After 20 generations the contour of the 
channel gets a clearly visible dent in it. After 80 evaluations of the objective 
function, the algorithm has found a double-bump shape to be even better and 
after 180 calls to the solver, no further significant improvement is observed. The 
value of the objective function has dropped to about 10“® for the best run out 
of the ensemble. This means, that dispersion is almost zero and the channel will 
have very good separation properties. 
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Convergence of (3,12)-CMA-ES vs. (1+1)-ES, 5 runs averaged each 




20 40 60 80 100 120 140 160 180 200 

nr. of function calls 

Fig. 5. Convergence of (3,12)-CMA-ES [solid line] and (1+1)-ES [dashed line] vs. num- 
ber of evaluations of the objective function 



Solution of best run at generation 20 Solution of best 1+1 run at generation 180 




Fig. 6. Solution at generation 20 Fig. 7. Solution at generation 180 



6.2 Multiobjective Optimization 

We then introduced the total deformation of the channel contour as a second 
objective to be minimized simultaneously in order to minimize manufacturing 
costs. The second objective thus reads: 

k = Y,pI 



(9) 
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where pi are the shape parameters of the channel as introduced by [Mo- 
hammadi et al. (2000)]. The first objective remained unchanged. The algorithm 
used for this optimization was a SPEA with a population size of 20, a maximum 
size of the external nondominated set of 30 and a mating pool of size 10. 

Figure IHl shows the Pareto-optimal trade-off front after 80 generations of the 
algorithm and figures IHl and 11171 show the corresponding solutions, i.e. optimized 
shapes of the channel. One is now free to choose whether to go for minimal 
skewness at the expense of a higher deformation (c.f. figure EJ, choose some 
intermediate result or minimize deformation in order to minimize manufactur- 
ing costs and still get the lowest skewness possible with the given amount of 
deformation (c.f. figure E3) . 



Pareto front at generation 80 




Fig. 8. Pareto-front of nondominated solutions after 80 generations 



6.3 Comparison with Gradient Based Methods 

Figures [ni and Elshow two classes of optimized shapes obtained by [Mohammadi 
et al. (2000)]. using gradient methods. It is interesting that the gradient tech- 
nique offers the same two designs, namely the single-dented (fig. II III and the 
double-dented (fig. 1 1 211 shapes, which we found with the evolution strategy after 
40 or 180 generations, respectively. Therefore, we obtain qualitatively similar 
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0 D.2 D.4 0.6 O.B 1 D 0.2 0.4 0.6 0.8 1 

Fig. 9. Solution at point 1 Fig. 10. Solution at point 3 




Fig. 11. First optimized shape using gra- 
dient methods 




Fig. 12. Second optimized shape using 
gradient methods 



results from both methods. Using the gradient method, the skewness is reduced 
by one order of magnitude |Mohammadi et al. (2000) which is comparable to 
the numbers obtained by evolutionary optimization. While trial and error pro- 
cedures were used in the gradient methods to obtain various solutions, evolution 
strategies provide us with a number of solutions (Pareto front) in a fully au- 
tomated fashion. Unlike the gradient based methods which require an explicit 
formulation of the optimization problem in hand, the evolution strategy provides 
a straightforward optimization procedure. Moreover, the small cost of compu- 
tation implies that evolution strategies are a reliable method leading to greater 
flexibility and shorter “time-to-solution” . 



7 Conclusions and Future Work 

Single and multiobjective evolutionary algorithms have been implemented and 
assessed. The SPEA has successfully been extended to support targeting in ob- 
jective space. It has been shown that these algorithms are easy to apply to fluid 
dynamical problems and that their solutions are comparable to those found by 
gradient based methods. In cases where gradient methods cannot be applied 
or where they would involve too complex mathematical calculations, evolution 



MicroChannel Optimization Using Multiobjective Evolution Strategies 529 



strategies are a good alternative to solve an optimization problem or reduce the 
time needed to do so as part of hybrid processes. 

Future and present work addresses the acceleration of convergence of these 
algorithms and their implementation in hybrid processes. 
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Abstract. The main objectives of cancer treatment in general, and of cancer 
chemotherapy in particular, are to eradicate the tumour and to prolong the 
patient survival time. Traditionally, treatments are optimised with only one 
objective in mind. As a result of this, a particular patient may be treated in the 
wrong way if the decision about the most appropriate treatment objective was 
inadequate. To partially alleviate this problem, we show in this paper how the 
multi-objective approach to chemotherapy optimisation can be used. This 
approach provides the oncologist with versatile treatment strategies that can be 
applied in ambiguous cases. However, the conflicting nature of treatment 
objectives and the non-linearity of some of the constraints imposed on treatment 
schedules make it difficult to utilise traditional methods of multi-objective 
optimisation. Evolutionary Algorithms (EA), on the other hand, are often seen 
as the most suitable method for tackling the problems exhibiting such 
characteristics. Our present study proves this to be true and shows that EA are 
capable of finding solutions undetectable by other optimisation techniques. 



1 Introduction 

Cancer chemotherapy is a highly complex process which controls tumour 
development hy the administration of a cocktail of chemicals in a series of doses over 
a course of treatment. There is a wide variety of anti-cancer drugs available to 
oncologists. Due to their high toxicity, these drugs give rise to a variety of side- 
effects, ranging from cosmetically undesirable through debilitating through to the 
effects that are themselves life threatening. The oncologist therefore is faced with a 
complex task of designing a therapy which achieves certain treatment goals whilst 
limiting the toxic side-effects of the treatment to an acceptable level. 

In the treatment of most common cancers multi-drug combinations are usually 
used. Traditionally, combination treatments are developed through empirical trials of 
different combinations, dosing, schedules and sequencing. However, since around 35 
drugs are in common clinical use nowadays [17], it is evident that an almost infinite 
number of treatment schedules are conceivable and that the need for the optimisation 
of chemotherapeutic treatment is indisputable. The number of combinatorial 
possibilities for multi-drug schedules, coupled to the conflicting nature [13] and non- 
linearity of the constraints imposed on cancer treatments, make it difficult to solve the 
problem of cancer chemotherapy optimisation by means of empirical clinical 
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experimentation or by means of traditional optimisation methods [6]. An alternative 
approach is to use evolutionary methods of computational optimisation to search for 
multi-drug treatment schedules that achieve certain treatment objectives and satisfy a 
number of simultaneous constraints. 

A body of work has been established by the authors [7], [8], [10] and [11], where 
they have applied Genetic Algorithms to find the best (or at least suitable) treatment 
strategies given a single optimisation objective. In this paper, however, we endeavour 
to develop this approach further and to address the problem of finding treatment 
strategies that show a good performance with respect to more than one treatment 
objective. Thus, the evaluation of different treatment strategies will involve multiple 
measures (objectives) of performance, which should be optimised simultaneously, 
even though they may be conflicting in nature. The presence of conflicting objectives 
gives rise to a set of optimal solutions, known as the Pareto-optimal set. If all 
objectives are equally important, the conflict between them requires a compromise to 
be reached. A good solution to such problems involving conflicting objectives and 
therefore multiple evaluation criteria, should offer suitable, though possibly sub- 
optimal in the single-objective sense, performance in all objective dimensions [14]. 
Generally, there exists a multitude of such solutions; hence, the algorithm used to 
solve a multi-objective optimisation problem should find a wide variety of them, 
instead of just one. 

Evolutionary Algorithms (EA) are a promising choice for solving the multi- 
objective optimisation problem of cancer chemotherapy for a number of reasons. 
Firstly, a set of Pareto-optimal solutions can, in principle, be captured in an EA 
population, thereby approximating the Pareto-optimal set in a single simulation run 
[2]. Secondly, in general Evolutionary Algorithms are less susceptible to the shape or 
continuity of the Pareto front than other techniques of multi-objective optimisation 
[16]. Thirdly, it has been shown by the authors (see [10] and [11]) that the problem of 
optimising cancer chemotherapy treatment belongs to the class of complex 
optimisation problems involving such features as discontinuity, multi-modality, non- 
connected, non-convex feasible regions, and inaccuracy in establishing model 
parameters. This is precisely the problem area where the methods of evolutionary 
computation really distinguish themselves from their competitors, thereby reinforcing 
the potential effectiveness of Evolutionary Algorithms in multi-objective optimisation 
of chemotherapeutic treatment. 

The remaining sections are organised as follows. In section 2 we provide the 
background information on optimisation of chemotherapeutic treatment, which 
includes medical aspects of chemotherapy, the formulation of treatment design as a 
constrained multi-objective optimisation problem, and a description of salient features 
of Evolutionary Algorithms used in multi-objective optimisation. Section 3 explains 
implementation details of the evolutionary search for Pareto-optimal treatment 
schedules. The results of chemotherapy optimisation and their analysis are given in 
Section 4. Finally, Section 5 summarises the contribution of the present study to 
cancer chemotherapy and outlines possible directions for its further development. 
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2 Optimisation of Chemotherapeutic Treatment 

Amongst the modalities of cancer treatment, chemotherapy is often considered as 
inherently the most complex [17]. As a consequence of this, it is extremely difficult to 
find effective chemotherapy treatments without a systematic approach. In order to 
realise such an approach, we need to take into account the medical aspects of cancer 
treatment. 



2.1 Medical Aspects of Chemotherapy 

Drugs used in cancer chemotherapy all have narrow therapeutic indices. This means 
that the dose levels at which these drugs significantly affect a tumor are close to those 
levels at which unacceptable toxic side-effects occur. Therefore, more effective 
treatments result from balancing the beneficial and adverse effects of a combination 
of different drugs, administered at various dosages over a treatment period. 

The beneficial effects of cancer chemotherapy correspond to treatment objectives 
which oncologists want to achieve by means of administering anti-cancer drugs. A 
cancer chemotherapy treatment may be either curative or palliative. Curative 
treatments attempt to eradicate the tumour. It is believed that chemotherapy alone 
cannot eradicate cancer, but if the overall tumour burden is held below a certain level, 
other mechanisms (e.g. immune system or programmed cell death) will remove 
remaining tumour cells. Palliative treatments, on the other hand, are applied only 
when a tumour is deemed to be incurable. Here the objective is to maintain a 
reasonable quality of life for as long as possible. 

The adverse effects of cancer chemotherapy stem from the systemic nature of this 
treatment: drugs are delivered via the bloodstream and therefore affect all body 
tissues. Since most anti-cancer drugs are highly toxic, they inevitably cause damage 
to sensitive tissues elsewhere in the body. In order to limit this damage, toxicity 
constraints need to be placed on the amount of drug applied at any time interval, on 
the cumulative drug dosage over the treatment period, and on the damage caused to 
various sensitive tissues [17]. In addition to toxicity constraints, the tumour size (i.e. 
the number of cancerous cells) must be maintained below a lethal level during the 
whole treatment period for obvious reasons. 

The goal of cancer chemotherapy therefore is to achieve the beneficial effects of 
treatment objectives without violating any of the abovementioned constraints. This 
problem would not be much different from that of a general class of constrained 
optimisation problems, was it not for the conflict between treatment objectives. The 
objectives of curative and palliative treatments conflict with each other in the sense 
that drug schedules which tend to minimise tumour size are highly toxic and therefore 
have a negative effect on the quality of patient’s life. Moreover, it has been shown 
that a severe treatment schedule that fails to cure can result in a shorter patient 
survival time (PST) than a milder palliative treatment [6]. 

Previously, the conflict between objectives was resolved by addressing each of 
them separately, that is, treatment strategies were sought which optimised only one of 
the objectives without considering the other [7]. The choice of the best strategy was 
left to the decision maker, i.e. the practicing oncologist who treats the patient; the role 
of the optimiser was to provide the alternatives to choose from. Although this 
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approach produced some interesting results, it cannot show the whole picture. In 
particular, the single-ohjective approach is ineffective in finding versatile treatment 
schedules that show a reasonably good performance in one objective dimension and, 
at the same time, can be effectively used with the other objective in mind. 

We contend that such versatile treatment schedules will belong to the Pareto- 
optimal set, which needs to be found by the optimisation algorithm capable of dealing 
with multi-objective optimisation. We also contend that Evolutionary Algorithms are 
well-suited for this role. However, before EA can be applied to the multi-objective 
optimisation problem of cancer chemotherapy, we need to mathematically formulate 
the objectives of chemotherapeutic treatment and the constraints imposed on it. 



2.2 Problem Definition and Related Concepts 

In general, a multi-objective optimisation problem (MOP) consists of n decision 
variables comprising a decision vector X = (xj , ^ 2 , . . . , ) e O c 91 " , m constraints 

gj(x),g 2 (x),...,g„(x) , and k objectives expressed as (non)linear criteria or 
objective functions /j (x), /2 (x), . . . , /j. (x) . Brought together, the multiple objectives 
define the evaluation function E(/j (x), /2 (x), . . . , /^ (x) j : O — ^ A <z 91*^ , which, if 
some of the objectives are in conflict, places a partial, rather than normal, ordering on 
the search space Q. [14]. In order to mathematically define this partial ordering, a 
notion of Pareto dominance is introduced in the objective space A . (NOTE. In this 
paper we will be concerned with the problem of maximising the values of the 
objective functions.) 

Definition 1. A decision vector X = (xj , ^ 2 , . . . ) is said to dominate 

X' = (xj , X 2 , . . . x' ) , denoted as X X' , iff the value of the evaluation 
function at X, E(x), is partially greater than E(x') , 
i.e., Vi f. (X) > f. (xO A e {1, . . . , k\\ f. (x) > / . (x') . 

The specificity of multi-objective optimisation is to find a set of non-dominated 
decision vectors rather than the global optimum, which might not even exist. For this 
purpose, the concept of Pareto optimality ought to be used. 

Definition 2. The decision vector X g is Pareto- optimal iff x is non-dominated 
regarding Q . ; formally 
-i(3x'g flix'^x) 

Pareto-optimal decision vectors cannot be improved in any objective without 
causing deterioration of at least one other objective. Such decision vectors comprise 
the Pareto-optimal set, P* c Q , in the search space. The mapping of the Pareto- 
optimal set to the objective function space gives rise to the Pareto front PF* . The 
Pareto front can be non-convex and non-connected; nonetheless, if it is known, or at 
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least approximated reasonably well, the decision maker will be able to select a 
solution via a choice of acceptable objective performance and, as a result of this, the 
problem of multi-objective optimisation will be resolved. 

Therefore, in order to solve the optimisation problem of cancer chemotherapy, we 
need to find the set of treatment schedules, which yields the Pareto front in the 
treatment performance space. This will allow the oncologist to make a decision on 
which treatment schedule to use, given his/her preferences or certain priorities. In the 
remainder of this section we will define the decision vectors and the search space for 
the cancer chemotherapy optimisation problem, specify the constraints, and 
particularise the optimisation objectives. 

Anti-cancer drugs are usually delivered according to a discrete dosage program in 
which there are n doses given at times [6]. In the case of multi-drug 

chemotherapy, each dose is a cocktail of d drugs characterised by the concentration 
levels C- , ( e 1, n, je \,d of anti-cancer drugs in the bloodplasma. Optimisation of 

chemotherapeutic treatment is achieved by modification of these variables. 
Therefore, the search space Q. of the chemotherapy optimisation problem is the set of 
control vectors C = (C„ ) representing the drug concentration profiles. 

However, not all of these profiles will be feasible as chemotherapy treatment must 
be constrained in a number of ways. Although the constraint sets of 
chemotherapeutic treatment vary from drug to drug as well as with cancer type, they 
have the following general form. 

1. Maximum instantaneous dose for each drug acting as a single agent: 

g 1 (c) = { - c, > 0 ; Vi euyj eld] (1) 



2. Maximum cumulative dose for drug acting as a single agent: 



^2(C) 



c 



cum j 



f^c,j>0\yjel,d 



( 2 ) 



3. Maximum permissible size of the tumour: 

g3(C) = {^max-A'(^,)^0; V/eu} (3) 



4. Restriction on the toxic side-effects of multi-drug chemotherapy: 



^4(C) 



Cs-eff* ^0 :Vi6 l,M,V/:6 l,m 

./=1 



( 4 ) 



The factors 77 ^^. in the last constraint represent the risk of damaging the k organ or 
tissue (such as heart, bone marrow, lung etc.) by administering the j drug. 
Estimates of these factors for the drugs most commonly used in treatment of breast 




536 A. Petrovski and J. McCall 



cancer, as well as the values of maximum instantaneous and cumulative doses, can be 
found in [4], [8] or [11], 

Regarding the objectives of cancer chemotherapy, we focus our study on the 
following two. The primary objective is to eradicate the tumour (curative treatment). 
We define eradication to mean a reduction of the tumour from the initial size to a size 
below 10^ cells. Clinical experience shows that other mechanisms (e.g. programmed 
cell death, a.k.a. apoptosis) are capable of removing remaining tumour cells at this 
point. 

In order to simulate the response of a tumour to chemotherapy, a number of 
mathematical models can be used [10]. The most popular is the Gompertz growth 
model with a linear cell-loss effect [17], which has been extensively validated in 
clinical trials: 



dN 

dt 



m • Ulnf ^1- X 



^(0 



7=1 



/=1 



( 5 ) 



where N(t) represents the number of tumour cells at time t ; A, 0 are the 
parameters of tumour growth, H(t) is the Heaviside step function; Kj are the 
quantities representing the efficacy of anti-cancer drugs, and C,-,- denote the 

V 

concentration levels of these drugs. One advantage of the Gompertz model from the 
computational optimisation point of view is that the equation (5) yields an analytical 
solution after the substitution M(t) = ln(0/A/^(t)) [5]. Since u(t) increases when N(t) 
decreases, the primary optimisation objective of tumour eradication can be formulated 
as follows [9]: 



maximise 

c 



/i(C) = jln| 



0 



N(t) 



dr 



( 6 ) 



subject to the state equation (5) and the constraints (l)-(4). 

The second objective of cancer chemotherapy is to prolong the patient survival 
time (PST) maintaining a reasonable quality of life during the palliation period. If we 
denote the PST as T , then the second objective becomes: 



maximise 

c 




( 7 ) 



again subject to (l)-(5). 

Therefore, the evaluation function of the multi-objective optimisation problem of 
cancer chemotherapy takes the form of a two-dimensional vector function 
TCc) = [/j(C),/ 2 (C)] ^ , which maps the decision vectors CGtl to the objective 
function space A a 91^ using the objectives (6) and (7). 
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As we mentioned in the previous section, these objectives are conflicting in nature. 
The conflict between objectives manifests itself in the fact that small tumours are 
more likely to be successfully eliminated, whereas it is much easier to palliate a large 
tumour [6]. Thus, in order to pursue the first treatment objective the maximum 
tolerable amount of drugs has to be administered at the start of treatment. The best 
palliative strategy, on the other hand, is to allow the tumour to grow up to the 
maximum size and then to maintain it at that level using only a necessary amount of 
drugs. 

Taking this into account and considering the number of constraints imposed on 
chemotherapeutic treatment, it is not difficult to see that the traditional approaches to 
multi-objective optimisation of cancer chemofherapy (such, for example, as fhe 
weighting or constraint methods) are likely to fail. Our previous experiments with 
traditional optimisation methods (the complex and Hooke & Jeeves techniques) 
showed a lack of robustness in finding feasible solutions even in the case of single- 
objective optimisation [11]. Moreover, all traditional methods require several 
optimisation runs to obtain an approximation of the Pareto-optimal set. As the runs 
are performed independently from each other, synergies between them cannot be 
easily exploited, which may lead to substantial computational overhead [16]. 

Therefore, the necessity of a specialised optimisation technique to deal with the 
cancer chemotherapy MOP is evident. Recently, Evolutionary Algorithms (EA) have 
become established as an alternative to traditional methods. The major advantages of 
EA are: 1) fhe ability to effectively search through large solution spaces; 2) the ability 
to overcome the difficulties faced by the traditional methods mentioned above; and 
3) the ability to approximate the Pareto-optimal set in a single run. In the following 
section we briefly discuss the salient features of Evolutionary Algorithms. 



2.3 Evolutionary Multi-objective Optimisation 

Evolutionary Algorithms entail a class of stochastic optimisation methods that 
simulate the process of natural selection. Although the underlying principles are quite 
simple, these algorithms have proven to be in general robust and powerful [1]. A 
large number of applications of EA to hard, real-world MOPs, the survey of which is 
given in [2], suggest that multi-objective optimisation of cancer chemotherapy is the 
problem set where Evolutionary Algorithms might excel. 

As with any MOP, the problem of cancer chemotherapy optimisation involves two 
independent processes. The first process is the search through the solution space for 
the Pareto-optimal set. The search space of cancer chemotherapy MOP is very large 
[7], which makes the multi-directional and synergetic features of EA extremely 
helpful. The second process is decision-making, i.e. the selection of a suitable 
compromise solution from the Pareto-optimal set. 

Depending on the order of performing fhese processes, the preferences of the 
decision maker (the oncologists in our case) can be made known either before, during 
or after the search process [14]. In the case of a priori preference articulation, the 
objectives of the given MOP are aggregated into a single objective that implicitly 
includes preference information (in the form of objective weights for example). This 
approach requires profound domain knowledge, which is not available for the 
optimisation problem of cancer chemotherapy [3]. 
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If the search process is performed without any preference information given by the 
oncologist, then we are applying a posteriori preference articulation. Here, the search 
results in a set of candidate treatment schedules (ideally the Pareto-optimal set of 
treatments), from which the final choice is made by the oncologist. The main 
drawback of the latter approach is that it entirely excludes the domain knowledge, 
which in some cases might substantially reduce the size of the search space or/and its 
complexity. However, in a general case of cancer chemotherapy optimisation such a 
reduction is not advisable [8], which supports the suitability of the a posteriori 
approach. 

Also, the process of decision-making may overlap with that of search. This means 
that after each optimisation step, a number of alternative treatment schedules 
(temporary Pareto-optimal set) are presented, on the basis of which the oncologist 
specifies further preference information, thereby guiding the search process. Such an 
approach is known as progressive preference articulation [14] and is a promising way 
to combine the advantages of the previous two. One example of how it can be used in 
the context of cancer chemotherapy is to optimise the modification of an existing 
treatment schedule rather than a schedule itself [11]. However, in this paper we 
concentrate our efforts on the optimisation of treatment schedules themselves as this 
is a more general problem. In solving this problem we do not wish to restrict the 
search process in any way, since a priori information on whereabouts of the Pareto- 
optimal set in the search space is unavailable. Therefore, hereafter we need to resort 
to a posteriori preference articulation approach to multi-objective optimisation. 

Having established the strategic aspects of the method that is to be utilised for 
solving the cancer chemotherapy MOP, we now need to specify the implementation 
details. A general Evolutionary Algorithm can be presented as follows. 




Fig. 1. Input, Output, and Internal Structure of a Generic Evolutionary Algorithm 

This general structure holds for most EA implementations. The distinctive feature of 
Evolutionary Algorithms applied to multi-objective optimisation, however, is that 
they require addressing the following specific issues [15]. The first issue is how to 
accomplish fitness assignment, and consequently selection, given a vector-valued 
evaluation function F A . In contrast to single-objective optimisation, where 
the fitness function takes into account only one optimisation objective, the fitness 
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function of a multi-objective EA needs to map a k -dimensional objective function 
space to scalar numbers in such a way as to guide the search process to the Pareto- 
optimal set. Secondly, the diversity of an EA population has to be maintained more 
than ever in order to achieve a well distributed and well spread set of non-dominated 
solutions, in addition to preventing premature convergence. 

A body of work has been established setting up various fitness assignment 
methods, selection techniques, and population diversifying schemes [2], [5], [14], 
[16]. As a consequence of this, many implementations of multi-objective EA are now 
available. In spite of this variety, however, there is no clear guideline on which EA 
implementation is suited to which sort of problem in the sense of ensuring that the 
derived solutions are the best available [15]. Thus, the choice is subjective and is 
often based on the developer’s attempt to integrate the domains of the optimisation 
problem and that of the implementation algorithm [14]. 

Among the different implementation algorithms that have been proposed in the 
literature and have been used by EA practitioners, we have chosen and will base our 
study on the Strength Pareto Evolutionary Algorithm (SPEA) thoroughly described in 
[16]. This algorithm combines promising aspects of various multi-objective EA and 
has shown a superior performance on a number of test problems [15]. In the next 
section we describe how it can be applied to the multi-objective optimisation problem 
of cancer chemotherapy. 



3 Evolutionary Search for Optimal Treatment Schedules 

The search process aiming at finding non-dominated (with respect to the treatment 
objectives specified in Section 2.1) chemotherapy schedules is the main part of 
computational optimisation of chemotherapeutic treatment. The decision-making 
process is, of course, based on the results of this search, but it is left to oncologists 
and therefore lies outside the scope of the present paper. 

The search for non-dominated treatment schedules is accomplished using the 
SPEA approach. Multi-drug chemotherapy schedules, represented by decision 

vectors C = (Cjj),ie l,n, j e l,d , are encoded as binary strings. Using the EA 

terminology, the individual space I (a discretized version of Q.) can then be 
expressed as a Cartesian product 

I = a/ xA^x...xAf xA\ xA\ x...xAj x...xA\ xA^x...xA^ (8) 

of allele sets A/ . Each allele set uses a 4-bit representation scheme 

A/ = |a[a 2 a 3«4 • O), e {0,l} Vk g 1,4| (9) 

so that each concentration level C,, takes an integer value in the range of 0 to 15 
concentration units. In general, with n treatment intervals and up to 2'’ 
concentration levels for d drugs, there are up to 2"^'^ individual elements. 
Henceforth we assume that n = 10 and that the number of available drugs in restricted 
to three, one of which is strong but highly toxic, another is medium, and the last one 
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is less toxic at the expense of reduced efficacy. In our study we experiment with the 
following drugs: Taxotere (strong), Adriamycin (medium), and Cisplatinum (weak), 
which are commonly used in multi-drug treatment of breast cancer. The values 
n = 10 and d = ?> result in the individual (search) space of power 1 1 1 = 2^^*^ 
individuals, referred to as chromosomes. 

Thus, a chromosome xel can be expressed as 

X = {ajOjOs ■■■‘^ 4 nd ■ ^ {^>1} V/c G l,4nc?| 

and the mapping function m : I — » C between the individual I and the decision 
vector C spaces can be defined as 

4 

‘^4d(i-l)+4(;'-l)+/l’ ^ ^ (11) 

i:=l 

where AC j represents the concentration unit for drug j . This function symbolizes 
the decoding algorithm to derive the decision vector C = m{x) from a chromosome 
X. If this vector violates any of the constraints (l)-(4), a penalty is applied to the 
values of the objective functions. The evaluation function F and the penalties yield 
the following augmented objective vector: 




on the basis of which the fitness value is assigned to x . 

The fitness assignment procedure is a two-stage process that uses two interacting 
populations - the external set P , which stores the individuals representing a non- 
dominated front among all solutions considered so far, and the EA population of 
chromosomes P . The first stage is to rank the elements of P , and the second is to 
evaluate the chromosomes in P . The full description of the fitness assignment 
procedure and of other auxiliary SPEA elements is given in [16]. All that remains to 
be specified here is the parameter settings of the SPEA algorithm: namely, the 

population size N , the maximum number N of elements in the external set, the 
probabilities of crossover ( p^) and mutation ( ), and the maximum number of 
generations TC , which serves as the stopping criterion. 

In the choice of these parameters we will adhere to the values used in the previous 
work [7]. This will allow us to make an unbiased comparison between the single- and 
multi-objective approaches to chemotherapy optimisation. Moreover, in [12] the 
authors have shown that with the following values of crossover and mutation 
probabilities the efficiency of evolutionary search significantly improves. Taking all 
these into account, the SPEA parameters will be set to the values: rC = 10000, 
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= 0.6 and = 0.1 . Regarding the n/ N ratio, we have chosen it to be 50/5 as 
the population size N = 50 has proved to be efficient in the previous studies (see [7], 
[10] and [11]), and the size N = 5 is deemed to be sufficient to provide the required 
density of solutions on the Pareto front without drastically reducing selection pressure 
of SPEA. 



4 Results 

In order to illustrate the results of cancer chemotherapy optimisation, a number of 
typical treatment scenarios are usually considered. For instance, in [7] the authors 
apply single-optimisation to three scenarios that very often occur in practice: 1) cure 
possible, eradication treatment is applied; 2) cure impossible, eradication treatment is 
applied; 3) cure impossible, palliative treatment is applied. However, when the 
single-objective approach is used, it is often necessary to assign priorities to each 
objective. Generally, the primary objective of cancer treatment is to eradicate the 
tumour. In cases when the eradication is possible (the first treatment scenarios in [7]), 
different treatment schedules are merited on the basis of how quickly they can achieve 
this goal. If, on the other hand, cure is impossible (the second treatment scenario) but 
the treatment objective remains unchanged, then a single-objective optimisation 
algorithm is likely to yield a solution far from optimum. In the latter case the 
palliative treatment gives much better results [7]. 

Thus, one strong drawback of single-objective chemotherapy optimisation is that 
the choice of the desired treatment outcome needs to be made when the treatment 
starts and the cost of a mistake may be unacceptably high. In order to overcome this 
difficulty, we now consider the main objectives of chemotherapeutic treatment - 
tumour eradication and prolongation of PST - simultaneously for each potential 
chemotherapy schedule. Figures 2, 3 and 4 show three multi-drug treatment 
schedules from the set of Pareto-optimal decision vectors found by SPEA. 

As can be seen from this table, the schedules B and C yield a good value of one 
optimisation objective ( /j (C) and /j (C) respectively) at the expense of relatively 
poor performance in the other objective’s dimension. This is reminiscent to the 
single-objective optimisation [7], which would favour the schedule B as a candidate 
for the tumour eradication treatment strategy and would reserve the schedule C for 
palliative treatment. 

From the treatment profile corresponding to the schedule B (see Figure 3) we can 
observe that the good result in tumour eradication is achieved by administering high 
doses of Taxotere and Cisplatinum towards the end of the treatment period. There is 
a danger in doing this however. If the tumour eradication strategy fails to achieve the 
desired outcome, studies show that the tumour can re-grow again and reach the lethal 
size in shorter time than it would have done had a milder palliative strategy been used 
[6]. The treatment schedule C, on the other hand, prolongs the PST to a greater 
extent by keeping the number of tumour cells at an acceptable level (Figure 4 shows 
that this is done by administering relatively small dosages of anti-cancer drugs), but 
might miss a chance to completely eradicate it. 




% Cmax 
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Schedule A 




Fig. 2. Schedule achieving an agreeable balance between treatment objectives 



Schedule B 




Fig. 3. Schedule excelling in minimising the size of tumour burden 
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Dose I Dose 2 Dose 3 Dose 4 Dose 5 Dose 6 Dose 7 Dose 8 Dose 9 Dose 

10 



- Adriamycin 



-Taxotere 



- Cisplatinum 



Fig. 4. Schedule excelling in prolongation of the patient survival time 

Table 1 gives the values that quantitatively characterise the treatment schedules 
presented above. 



Table 1. Comparison between different Pareto-optimal treatment schedules 



Optimal 

treatment 


Constraints 

satisfaction 


/l(C) 


fiio) 


Average tumour size 
in terms of Aljfp ) 


PST 


Schedule 

A 


All constraints 
are satisfied 


4.3230 


4.3370 


0.7986 


36 

weeks 


Schedule 

B 


All constraints 
are satisfied 


5.5864 


3.3835 


0.7380 


35 

weeks 


Schedule 

C 


All constraints 
are satisfied 


3.1151 


7.0297 


0.8998 


38 

weeks 



The major result of using the multi-ohjective approach to cancer chemotherapy 
optimisation is in finding the schedule A. As can be seen from its values of the 
optimisation objectives in Table 1, it is an agreeable compromise between the other 
two schedules. Although the schedule A is outperformed hy the schedules B and C in 
the single-ohjective sense, we believe that it represents a more versatile treatment 
strategy. Our interpretation of its mode of action is that it makes attempts to eradicate 
the tumour using high doses of Taxotere - the most efficacious drug available (see 
Figure 2). Failing these attempts, the schedule A switches to the palliative regime 
similar to that of the schedule C. 

The final remark we would like to make in this section is that the information on 
the effects and on the modes of delivery of the specified drugs has been given to us by 
our collaborating oncologists or taken from [4]. In our experiments with single- 
objective optimisation of cancer chemotherapy in [7], [8] and [11], we managed to 
successfully emulate the outcome of actual clinical trials using the mathematical 
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model (5) and the constraints (l)-(4). This gives us a reason to believe that the results 
presented in this paper are also viable and that our approach to multi-objective 
optimisation of cancer treatment can be used in real life situations. In order to prove 
this we intend to ask clinicians to try our schedules developed for more complex 
treatment scenarios with the help of the Oncology Workbench [7], [8]. 



5 Discussion 

In this paper we addressed the problem of multi-objective optimisation of cancer 
chemotherapy. A number of different objectives of chemotherapeutic treatment can 
be defined. Although some of these objectives need to be considered simultaneously 
in order to develop an effective treatment, in the past they were optimised 
independently from each other. Our present study attempts to cover this gap and 
utilises the evolutionary methods of computational optimisation to find a solution to a 
complex optimisation problem with two conflicting objectives. The solution is sought 
in the form of a Pareto-optimal set, which is approximated by an optimisation run of 
the Strength Pareto Evolutionary Algorithm. The resultant set found during our 
experiments includes not only the treatment schedules discovered in the previous 
studies with the help of the single-objective approach, but a number of new schedules 
as well that have not been detected before. Thus, the multi-objective approach to 
chemotherapy optimisation reveals additional treatment strategies that can be more 
suitable in certain cases, thereby assisting in the decision-making process. The 
number of such strategies increases when more doses are administered during 
treatment or more anti-cancer drugs are used. 

Therefore, one possible direction for future work is to explore more complex 
treatment scenarios. This might necessitate the introduction of additional objectives 
in the evaluation function and the enlargement of the EA population and of the 

external set. It will be interesting for us to experiment with different N j N ratios in 
order to see the effect of this parameter on the effectiveness of the EA search. 
Another direction is to develop a software tool on the basis of the Oncology 
Workbench described in [7] and [8] that will allow the oncologist to refine or change 
treatment preferences during the optimisation run. This would integrate the search 
and the decision-making processes, resulting in a more efficient and reliable choice of 
chemotherapeutic treatment. 
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Abstract. This paper discusses and compares the methods of Multi Objective 
Genetic Algorithm and Multi Objective Simulated Annealing applied to LC 
filter tuning. Specifically, the paper is concerned with the application and 
implementation of these methods to the design of an antenna tuning unit, 
providing the facility to adapt to changes in load impedance, temperature or 
environmental effects, ensuring maximum power transfer and harmonic 
rejection. A number of simulations were carried out to evaluate the relative 
performance of these algorithms. 



1 Introduction 

For radio antenna transmission systems it is of great importance to transmit maximum 
power to the antenna to achieve maximum transmission efficiency. In many cases, for 
example mobile and fixed tactical applications, these systems must deal with changing 
load and environmental aspects. The goal of obtaining fast antenna tuning systems that 
are capable of offering impedance matching whilst maintaining good harmonic 
rejection properties has become increasingly significant. One of the most popular 
impedance matching configurations used is the Pi-Network, which is simple in 
structure, can accommodate a wide range of load impedances and offers high 
harmonic rejection capabilities. Although the Pi-Network may be tuned manually, an 
automatic tuning system offers the advantages of improved tuning times and 
unsupervised tuning of a transmission system. In addition, if an automatic tuner can be 
developed which is capable of re-tuning quickly enough, then the transmission system 
may become dynamic, capable of adjusting to changes in antenna impedance, 
operating frequency or environment in order to maintain optimum performance. The 
Pi-Network’s versatility allows additional criteria to be considered for optimisation i.e. 
harmonic rejection, parasitic effects, component costs etc. However, the contribution 
each component makes to the Pi-Network impedance characteristic is complex. 
Current tuning algorithms in commercially available equipment use a step by step 
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approach, where the tuning network is adjusted iteratively, until impedance matching 
is achieved [1-5]. This method is slow and focuses only on impedance matching, 
rather than optimising the full capabilities of the Pi-Network. There have been reports 
in the literature of global optimisation techniques that have been applied to a range of 
applications, which consist of multiple objectives and where time constraints are also 
of importance. Previously, the Genetic Algorithm and Simulated Annealing [6] 
optimisation algorithms have been applied to this problem, however an aggregate 
fitness function is used, leading to the full capabilities of the system not being realised. 
The following paper attempts to use a pareto based fitness function to investigate the 
characteristics of the system, whilst also investigating MOEA ability to provide 
adequate network solutions. 



2 The Pi-Network 

Fig. 1 shows the arrangement of the pi-network between load and source impedances, 
represents the transmitter source impedance, (typically 50 ohms resistive) 
represents the complex load impedance, while Z,, Z^and Zj represent the impedance of 
the network at each stage from source to load respectively. In order to achieve 
conjugate impedance matching and hence maximum real power transfer from the 
transmitter to the load, Zj must equal the complex conjugate of the load impedance. 




Fig. 1. Pi-Network used for Impedance Matching between Source and Load Impedances 



Simple circuit analysis may be used to formulate the real and imaginary 
components of the Pi-Network output impedance, Z, = R + jX . 





548 M. Thompson 



R = 



{l-0)^LC^y +{coC,R^ +coC^R^ -co'LC^C^R^f 



( 1 ) 



Z=- 



{V-co^LCyy +(<«C,/?, +coC^R^ -co^ LC^C^Ryf 



■( 2 ) 



From these equations it is clear that each component influences the Pi-Network 
output impedance in a non-linear manner. 

Initially the components are to be considered to be continuous in the range {0,1}; 
this will provide insight into the theoretical ideal characteristics of the network. 
Further work will then explore the issues of discrete components 



3 Fitness Criteria 

Impedance Matching Criteria: The purpose of the Pi-Network is to create a conjugate 
match between the transmitter and antenna. When an impedance mismatch between 
the source and load impedances exists, a proportion of the electromagnetic wave 
energy transmitted by the source is reflected back from the load impedance. The ratio 
of the reflected wave energy to forward wave energy is termed reflection coefficient, 
p. This relationship between the impedance mismatch and reflection coefficient allows 
the reflection coefficient to be used as a measure of impedance matching and may be 
expressed by the following equation: 



l (R,-R,y+xJ 

i(R,+R,y+xy 



Where represents the real component of the load impedance and represents 
the imaginary component of the load impedance. The source impedance is assumed to 
be resistive of value R^. 

By measuring the reflection coefficient between the source and the input of the Pi- 
Network, the effectiveness of the Pi-Network to produce a conjugate match may be 
assessed. 

As the reflection coefficient is always positive (0 when all of the power is absorbed 
by the load, 1 when the load reflects all of the power), the error function for maximum 
power absorption can be expressed simply as; 

Matching Fitness = (1 - Ipl) 
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By defining the error function as simply proportional to 1- p , small changes in p 
will in turn give rise to small changes in the value of that solution’s fitness. 
Experimentation revealed that by increasing the rate of change in fitness as p tended to 
zero, an improvement in the speed of convergence may be achieved, this may be 
obtained through the fitness function shown; 

Matching Fitness = (1 - |p|)^ 

Harmonic Rejection: Impedance matching is not the only parameter of importance, 
another extremely important consideration is that of harmonic rejection of the system. 
If harmonic rejection is not considered, then this can permit harmonics of the 
transmission frequency to also be transmitted, the result being the generation of 
interference at these frequencies for other radio users. 

By modelling the transmitter to which the Pi-Matching Network is connected as an 
equivalent Thevinin voltage source and impedance ( pig. 2] , the harmonic rejection 
characteristics of the network may be obtained. 




If is set to 1 volt then may be expressed as follows; 
1 / 



V =- 



r,+jCOf,C^r, 

(l + jco^Cp^^)+ jcoC^[r^ —coILCjK^ + Jcol) 



z, + 1/ ^ 

" /jO)nC, 



Where 



Z, = 



\+ j0)„ C,r 



+ jO)^L = 



r„ -0)lLC,r,+j0)„L 
l + y®„Cjr 



Denoting the real and imaginary components of as and respectively, 
these may be obtained from the above expression as; 
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^ _ 1 - CO*„LC^ Cj r/ + col Cl Cj r/ + col Cl rl - col CC^ 

(l - col CC^ + {cOf, C| r + co„ Cj - 0)1 LC^ r 

^ 

(l - col CC 2 + {cOh Cl r, + CO„ Cj r - LC| T, 

The expressions for the real and imaginary components of the Thevinin equivalent 
impedance of the Pi-Network and source impedance may be expressed as follows: 



(i — ColLC^'f +(<y„C|r -l-(y„C,r, —OO^LCiC^r^ 



^ col CCl C + 2col LCj +WiiL-cOiiCirl -wlL^C^- col C Cl C , rl 

Zt ^ Xtxi — T T T"^ 

’ [l-wlLC^j +(tO„C,r TtO^C^r -to’ LCjC^r ) 

For harmonic rejection, the need is to minimise the power delivered to the load at 
the harmonic frequency. Therefore the worst possible case would occur should the 
load impedance be the complex conjugate of the Thevinin equivalent impedance at the 
harmonic frequency. Minimising the power delivered to the load under these 
circumstances ensures minimum power transfer to the load at the harmonic frequency. 
During impedance matching, the complex impedances resonate causing the imaginary 
impedances of the circuit to cancel. Therefore the total impedance of the circuit 
connected to the Thevinin equivalent source is 2*Z^Re and consequently the modulus 
of the current flowing through the circuit may be expressed as follows: 



I/I- 1 


K. 


— s 


/ 


1 ^ 


Kc 


2 


1 1“ 2* 


z 


Re 


1 4^ 


z 


Re’ 



And the power dissipated into the load is then 



P = \I\ Z Re = - 






4*Z Re 



Therefore by minimising this function ensures minimisation of the harmonic 
distortion to the desired signal. From a ‘fitness’ criteria for the algorithms this may be 
described as follows; 

Rejection Fitness = 

Therefore the fitness of the circuit improves as the power dissipated into the load at 
the harmonic frequency reduces. In the ideal case the power dissipated into the load at 
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the harmonic frequency should tend toward zero, consequently the fitness function 
should reflect this. 

It was discussed earlier how the Pi-Network’s versatility creates problems when 
attempting to control it for optimal performance. When multiple objectives are 
considered, conflicting requirements may occur i.e. for a particular situation 
impedance matching may be achieved through low inductor values, while maximum 
harmonic rejection is achieved through maximum inductance. These conflicts between 
objectives compound the tuning problem by creating more complex error functions, 
containing greater numbers of local minima in which a search algorithm may become 
entrapped. By considering the objectives separately, using pareto optimality this 
problem may be eliminated and allows the interaction between objectives to be 
examined. 



4 The Multi Objective Genetic Algorithm 

Multi objective genetic algorithms (MOGA) allow the solution of problems consisting 
of more than one objective to be realised, typically using pareto optimality. The result 
of this approach is a set of possible solutions, covering a range of emphasis on the 
objectives, effectively searching the objective space for every objective weighting. 
This offers significant benefits over conventional genetic algorithms where 
traditionally multiple objectives are realised through the simple aggregation of the 
various objectives. By optimising the system in the pareto sense, a true appreciation of 
the interaction between objectives may be realised. 

Each individual is evaluated in terms of pareto optimality [7], where each solution 
is evaluated for each objective separately and each solution is then compared to find 
the non dominated solutions. 

For two points in n-dimensional objective space, where, X=(X|,Xj,X 3 ...xJ and 
Y=(yi,y 2 ,y 3 >--yn)> ^ said to dominate Y when the following conditions are met; 

(Vi),A, <ijA(30Y, <ij 



Consider a problem where it is required to minimise two cost functions, iFig. 3| 
shows a sample set of solutions. As the requirement is to minimise both costs, each 
point evaluates each objective separately, points A, B and C are dominant solutions as 
there are no solutions which provide lower cost values for both objectives. Similarly, 
point D is dominated by all of the other solutions as every other solution provides 
lower costs for both objectives. For the GA fitness evaluation, an approach proposed 
by (Fonseca and Fleming [8]) is used where each solution is ranked in proportion to 
the number of individuals that dominate that point. Therefore for a point, x, dominated 
by p solutions the rank of the solution would be calculated as: 



rank{x) = 1+ p 
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Therefore non-dominated solutions would be ranked 1, and increasingly dominated 
solution would receive higher ranking s. Fitne ss is awarded to the solutions, such that 
higher ranking results in lower fitness. [pig. 3| shows the ranking of each individual 




5 Multi Objective Simulated Annealing 

The Simulated Annealing algorithm has been successfully applied to numerous 
applications including the well known Travelling Salesman Problem, as well as circuit 
layout design and other such problems [9-13]. Similarly Multiobjective Simulated 
Annealing has been found to provide a suitable alternative to the MOGA [14-15]. 

The process of Simulated Annealing optimisation is as follows; create an initial 
system configuration of variables, (C,, C^, L) and call this position the ‘datum point’, 
Pj. In order to move to a new point let the variable set be allowed to move from its 
datum value by a small quantity, A where A has some predetermined range e.g. (C, ± 
A ± A, L ± A), thus generating a new point, p^ 

The probability of this new point becoming the new datum point is determined 
using the Metropolis Criterion [11] and may be adapted for function maximisation as 
follows: 



probability = e if < /[ 

probability = 1 otherwise 



Where; 

fj represents the fitness of the datum point, p^ 
fj represents the fitness of the new point, p^ 

T represents the 'temperature' of the system 
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A random number is generated in the range {0,1}, if this number is less than or 
equal to the value of the probability function then f, becomes the new datum point, 
otherwise f^ remains the datum point. 

Therefore at high values of T the probability of a new point being selected as the 
new datum point is very high regardless of its fitness, as the temperature reduces, the 
algorithm becomes increasingly selective. This has the effect of initially allowing the 
algorithm to move freely throughout the function space but gradually becoming 
increasing selective as T is reduced. 

For each temperature, a number of iterations are made, the temperature is then 
reduced by some small amount. This has the effect of making the search mechanism 
slightly more selective because the algorithm becomes less and less likely to select 
new points that are inferior in fitness to the datum point. The process is repeated until 
a predetermined minimum temperature is obtained. The initially high temperatures 
allow the algorithm to move freely in the function space and as a result navigate 
through local minima. This algorithm may be easily applied to multiobjective 
problems. For an N-objective problem, simply record the probability of selection of 
each objective. 

The overall probability of a new solution being selected may be calculated as the 
product of the probabilities for all objectives, such that the probability of selection 
becomes; 



P 



total 



Ha 



6 Simulation Results 

In order to allow comparisons between the relative performance of the search 
algorithms to be made, a number of benchmark problems were used (load impedances 
normalised in terms of R^, such as 1+j, 1-j, 2, 0.5+0.5j, 0.5-0.5j). The problems were 
used to evaluate the effectiveness of the algorithms for different conditions. 



6.1 Multiobjective Genetic Algorithm (MOGA) 

iFig. \ shows the results obtained by 10 simulations of the algorithm to provide 
impedance matching and harmonic rejection for a load impedance of 0.5.+0.5j, a 
population size of 100 was used. After experimentation standard multipoint crossover 
was implemented with a crossover rate of 5%, standard mutation rate of 0.5% was 
used. 
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Fig. 4. Overall Pareto Solution Set For MOGA 



6.2 Elitist MOGA (EMOGA) 

The elitist algorithm consists of a main population of 100 individuals, after 
experimentation, a mutation rate of 0.5% and a crossover rate of 2% was selected. In 
addition to which a secondary population of 50 individuals is maintained, containing a 
number of the non-dominated solutions obtained by the algorithm up to that point. 
This population is updated at the end of each generation, replacing individuals that are 
dominated by any of the non dominated solution found in the new population. 
Furthermore 10 individuals are selected at random |from he secondary population and 
added to the current population, this to prevent bias in the search whilst also 
maintaining the search capabilities of the algorithm. Fig. 5 shows the non-dominated 
solutions of the final population of 10 simulations of the algorithm. 

6.3 Multi Objective Simulated Annealing (MOSA) 

The Simulated Annealing Algorithm used was given an initial temperature, T^, =100. 
The algorithm was allowed to make 50 iterations at each temperature setting, and the 
cooling rate was set to T„^^=0.999*T^,y, the terminating temperature was T=0.1. In 
similar fashion to the elitist MOGA, a population of 4oniinj nt solutions is maintained 
and updated at each iteration, recording the 50 best solutions obtained by the 
algorithm. Again the obtained solutions are displayed in Fig. 6. 
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Fig. 5. Overall Solution Set for EMOGA 




Fig. 6. Overall Solution Set Obtained by MOSA 



6.4 Evaluating Performance 

Assessing MOGA performance poses significant problems, as this is in itself a multi 
criteria problem. When comparing the final non-dominated solutions of two MOGA 
implementations, two significant criteria must be considered, these being: the range of 
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solutions obtained and how well they represent the true pareto optimal front of the 
system. It shall be assumed that the true pareto front is unknown, therefore the only 
means of evaluation available is to compare the MOGA solutions against each other. 

Expectation Measure. The paper proposes to use two modified versions of the 
measurement method in Zitzler and Thiele [16-17], as follows; 

Let X’ and X” be two pareto optimal fronts where X’ represents the target solutions 
and X” the comparable solutions. The function C maps the ordered pair (X’,X”) to the 
interval {0,1} 

\\ci gX \ Cl ^ Cl n 

C(X’,Z”):=^ i (3) 

X 



This function simply records the proportion of solutions in X’ which dominate 
solutions within X” 

In a similar fashion a second measure may be used to record the proportion of X’ 
which are dominated by X”; 

\\g gX l Cl ^ Cl Cl n 

D{X\X-)-=~ ^ (4) 

X 



The combination of these functions may then be used to provide greater insight into 
the relative performance of the two algorithms, providing information of the regions 
of dominant solution, dominated solutions and non-dominated solutions. In addition, 
by carrying out a number c |f simul ations of each algorithm and recording the values 
for t |ie abo| /e functions a measure of the algorithms’ robustness, reliability and quality 
of solutions may be obtained. Table 1 compares each of the algorithms for C(X’,X”) 
whilst Table 2 displays the results for D(X’,X”) 



Table 1. Results of C(X’,X”) 





X”=MOGA 


X”=EMOGA 


X”=MOSA 




mean 


Standard 

deviation 


mean 


Standard 

deviation 


mean 


Standard 

deviation 


X’=MOGA 


0.1538 


0.0986 


0.0679 


0.0649 


0.0084 


0.0222 


X’=EMOGA 


0.2331 


0.1024 


0.0169 


0.029 


0.039 


0.046 


X’=MOSA 


0.2537 


0.0995 


0.1614 


0.0146 


0.0441 


0.022 



7 Discussion of Results 



Examination of the results displayed in Table 1 show that the MOSA is able to locate 
solutions superior to those obtained by its competitors (approx. 25 % of the solutions 
obtained dominate the solutions obtained by the MOGA, whilst approx. 16% dominate 
the solutions obtained by the EMOGA). Similarly it may be shown that the solutions 





Application of Multi Objective Evolutionary Algorithms to Analogue Filter Tuning 557 



obtained by the EMOGA are superior to the solutions obtained using MOGA alone. 
Therefore the results in Table 1 demonstrate the relative performance of these 
algorithms in locating dominant solutions. 

Examination of Table 2 also identifies the proportion of the solutions obtained by 
the algorithms that are dominated by their competitors. Again this shows that the 
MOSA has a very small proportion of solutions dominated by those of either the 
MOGA or EMOGA. 



Table 2. Results of D(X’,X”) 





1 X”=MOGA 


1 X”=EMOGA 


1 X”=MOSA 1 




mean 


Standard 

deviation 


mean 


Standard 

deviation 


mean 


Standard 

deviation 


X’=MOGA 


0.143 


0.1004 


0.2928 


0.1286 


0.5659 


0.1373 


X’=EMOGA 


0.0355 


0.0355 


0.0141 


0.0215 


0.2449 


0.095 


X’=MOSA 


0.0013 


0.0034 


0.0114 


0.0143 


0.0531 


0.0286 



Further examination of the results for C (MOSA, MOSA) and D (MOSA, MOSA) 
also suggests a high level of robustness for the MOSA, showing a very low probability 
of a solution being able to dominate a solution obtained by a previous simulation of 
the MOSA and vice versa. This is clearly not the case for the MOGA, as examination 
of Figure 4 demonstrates the high level of fluctuation in the obtained solutions. This is 
confirmed hy examination of Table 2, which demonstrates that a solution obtained by 
the MOGA has approximately a 15% chance of being dominated by a solution 
obtained by a previous simulation of the MOGA. 

In addition to the quantitative evaluation of the solutions obtained by these 
algorithms, another factor is the speed of these algorithms to obtain the final set of 
solutions. The results were obtained using a Pentium II processor; the simulations 
were carried out using MATLAB. The approximate simulation times for each of the 
algorithms are listed below; 

MOGA: approx. 60 minutes 

EMOGA: approx. 60 minutes 

MOSA: approx. 10 minutes 

Therefore the increased speed in convergence and quality of solutions obtained by 
the MOSA when compared against the MOGA and EMOGA make this algorithm a 
clear candidate for multiobjective problems. 



8 Conclusion and Further Research 



The paper has presented a number of approaches that may be applied to filter tuning. 
Evaluation of these algorithms has been obtained using qualitative methods, where the 
algorithms’ ability to converge was inspected by examining the obtained pareto front, 
allowing a measure of the algorithms repeatability and ability to locate the true pareto 
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front of the system. More rigorous quantitative evaluation methods adapted from 
Zitzler et al [16-17] were then used to inspect the algorithms’ repeatability as well as 
provide tools for direct comparison of the algorithms performance. The original 
method proposed hy Zitzler et al has been modified such that the proportion of the 
obtained pareto front solutions that are found to be dominant, non-dominated or 
dominated by a competitor’s pareto front may be obtained. Further investigation into 
the assessment of the degree of dominance of the obtained pareto solutions may be 
pursued [18-19]; this will provide further tools for algorithm comparison and 
assessment. In addition, further work may explore the ability of these algorithms to 
provide solutions for non-ideal circuits considering such characteristics as internal 
resistance or other parasitic effects. These characteristics could then be incorporated 
into the circuit model. Furthermore, additional characteristics / constraints may be 
placed upon the system again to meet various issues such as: broadband impedance 
matching, minimal component values or restricted "off the shelf component values. 
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Abstract. When tackling the multicriteria optimization of a device in electrical 
engineering, the exhaustive sampling of Pareto optimal front implies the use of 
complex and time-consuming algorithms that are unpractical from the industrial 
viewpoint. In several cases, however, the accurate identification of a few non- 
dominated solutions is often sufficient for the design purposes. An evolutionary 
methodology of lowest order, dealing with a small number of individuals, is 
proposed to obtain a cost-effective approximation of non-dominated solutions. 
In particular, the algorithm assigning the fitness enables the designer to pursue 
either shape or performance diversity of the device. The optimal shape design 
of a shielded reactor, based on the optimization of both cost and performance of 
the device, is presented as a real-life case study. 



1 Introduction 

Optimal design in electromagnetism has a long history, from Maxwell (1869) on. In 
other fields of engineering like structural mechanics the history of optimal design is 
even longer, dating back to Lagrange (1770). In the latter area the modern 
development has taken place over the past three decades, anticipating the analogue 
development in electromagnetism and, to some extent, fostering it. In more recent 
years it has been possible to integrate the analysis of electromagnetic field with 
optimization techniques, so moving from computer-aided design (CAD) to automated 
optimal design (AOD). 

The essential goal of AOD in electromagnetics is that of identifying, in a 
completely automatic way, the system or the device that is able to provide some 
prescribed performance, e.g. to minimize weight and materials cost or to maximize 
some output, taking into account physical constraints and geometrical bounds. This is 
actually an inverse problem and implies the simultaneous minimization of conflictual 
objectives. 

In real-life engineering the presence of a single criterion or objective is somewhat 
an exception or a simplification. Therefore, the future of computational 
electromagnetics seems to be oriented towards, and conditioned by, the development 
of efficient methodologies and robust algorithms for solving multicriteria design 
problems. 
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From a formal viewpoint, a multicriteria problem is cast as follows: 

minF(x) 

subject to g(x)<0 

h(x)=0 

where F(x)=(fj(x),...,f_^(x)) is a vector of m criteria or objectives, x=(Xj,...,xJ is the 
vector of n design variables defining the device or the system, g(x) and h(x) are 
inequality and equality constraints, respectively. In general, the utopia solution x*, i.e. 
that minimizing all F^ simultaneously, does not exist and the so-called Pareto solutions 
are accepted, i.e. those for which no decrease in any of the criteria is obtained without 
a simultaneous increase in at least one of the other criteria. 

Traditionally, multicriteria problems are reduced to singlecriterion problems, for 
instance by means of one of the following procedures: 

i) the use of a penalty function composed of the various criteria; 

ii) the separate solutions of singlecriterion problems and their trade-off; 

iii) the solution of a singlecriterion problem, taking the other criteria as constraints. 
This approach leads to classical methods of multiobjective optimization and gives a 
solution which is supposed to be the optimum. 

Often in the design of electromagnetic devices a satisfactory way to tackle the 
problem of multicriteria optimisation consists of applying the Pareto optima theory in 
connection with a suitable minimization algorithm. The result is a set of non- 
dominated solutions: in principle, all of them are optimal; in practice, each of them 
corresponds to a different degree of minimization of the single objectives. 

Moreover, though looking attractive, the non-dominated approach often results to 
be unaffordable from the computational viewpoint; in fact, the evaluation of each 
objective may imply heavy non-linear field analyses in three-dimensional geometries. 
Consequently, the aim of a reliable method of multicriteria optimization should be to 
approximate the Pareto optimal front by fulfilling three requirements: 

- convergence to the front independent on the number, even very low, of non- 
dominated solutions; 

- remarkable diversity among non-dominated solutions; 

- moderate computational cost. 

An attempt towards this goal is here presented. 



2 EMO Strategy: Methodological Aspects 

The aim of a stochastic multiobjective optimiser based on non-dominated sorting is 
to obtain as many solutions as possible lying on the Pareto optimal set while 
preserving diversity among them. 

GA-based strategies [1],[2],[6] typically require some hundreds individuals for 
ensuring convergence. Moreover, when dealing with real-life optimization problems 
in electrical engineering, the evaluation of each objective often requires a FEM 
solution lasting several minutes [7], [10]. This difficulty often makes the use of GA- 
based strategies computationally unaffordable or highly unpractical from an 
industrial point of view. 
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Therefore we have decided to adopt a (1+1) ES algorithm as the optimization 
engine of the multiobjective strategy shown in Fig. 1 and Fig. 2 because, in our 
experience, it is robust and gives good convergence even when few individuals are 
considered. It should be noted that generation, mutation and annealing steps are 
implemented in parallel; this is possible because in our implementation individuals 
do not interact each other during the whole process, apart from the steps of Pareto 
ranking and fitness evaluation. In practice, the general structure of the algorithm is 
the same as NSGA whereas genetic operators have been replaced with the evolution 
strategy ones. 

Two criteria must be pursued when assigning the fitness value to each individual: 

1) forcing global convergence to the Pareto optimal set; 

2) forcing diversity among solutions belonging to the same set. 



GEN=Generation 
MUT=Mutation 
ANN= Annealing 
C=Constraints checking 



Classify Individnals into Pareto Sets 




Initial Population 
> 



Fig. 1. Non-dominated Sorting Evolution Strategy Algorithm (NSESA): flowchart 

To the first purpose, the fitness of each individual is evaluated according to the 
Pareto set, which it belongs to. To the second purpose, a sharing procedure is 
implemented within the current set in order to favour isolated solutions and prevent 
clustering. This step is particularly delicate when using a small number of individuals 
(say 5 to 10) and some changes with respect to classical sharing procedures [5], [8] 
are here proposed. 

In general, when implementing a fitness sharing procedure, diversity of individuals 
in either the design space or the objective space can be considered. Moreover 
solutions with strong diversity in shape can be characterised by weak diversity in 
objective value (the opposite as well). Both procedures can lead to results useful for 
the device designer, who is interested in both shape and performance diversity of 
optimal solutions. This is why a sharing procedure in only one of the two spaces 
cannot guarantee a satisfactory approximation of the Pareto optimal front in the other 
space. 
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CHtaess assignment complete^ 



Fig. 2. Fitness assignment algorithm: flowchart 

More into details, given a population sorted into Pareto sets, we at first consider the 
first set and assign a dummy fitness dfit to each individual as shown in Fig. 2, where 
cwi is the center of weight of the first front and upoint the utopia-point. 

In order to set up the sharing procedure we then evaluate the normalized average 
distances dij among elements, in both design and objectives domain. Afterwards we 
implement the standard sharing formulas [3], [4] for the calculation of the sharing 
parameter shi j and the penalty coefficient m, ; we evaluate the niche radius a in the 
following way: 
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considering or Oj when shape or performances diversity has to be enhanced, 
respectively; ndofwA nobjfaie number of design variables and number of objectives, 
respectively. 

Finally, the fitness value of the i-th individual is evaluated and assigned. Before 
moving to the k+l-th front a new dummy fitness dfitj^+i has to be evaluated, as shown 
in Fig. 2. In order to increase the convergence rate the new value of dummy fitness 
depends on the center of weights cw^, cwj,+j of current and next set, respectively. The 
procedure is repeated for all successive sets. 

We point out that convergence towards the optimal front is always controlled in the 
objective space, while sharing procedures can be performed in either design space or 
objective space. 

The following convergence indexes have been defined; 




Finally, three stopping criteria have been implemented: 

a) maximum number of iterations; 

b) minimum value of convergence index in the objective space; 

c) maximum number of iterations with no improvement found. 

Results of previous investigations [9], [11] on simplified test problems have validated 
the effectiveness of the strategy proposed. 



3 EMO Strategy: Numerical Aspects 

Several test cases on real-valued analytical functions have been carried out for 
validating the code implemented. Here we show results for one of them, namely the 
Deb’s t, problem. It is characterized by two variables and two objectives, giving rise 
to a non-connected Pareto front; the problem can be defined as follows: 

min(/i,/2) (4) 

/iU)=^i 

< 

/j (aj , Aj ) = 1 + 9^2 - -sjx^ (l + 9^2 ) - .Fj sin(lO.;DCj ) 

where (aj,A 2 )g (0,1)x(0,1) 
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Fig. 3. NSESA: 20 individuals solution for validation test 

As can be seen from Fig. 3, a solution composed of twenty individuals has been 
found; individuals are distributed along three of the five branches the POF is 
composed of. The starting population was chosen in a random way in the design 
space. Given the ik-th individual at niter-th iteration, the following two expressions 
have been used in design space and in objective space, respectively, in order to 
quantify the POF approximation error all along the evolution: 

errorx{ik,iter) = Aj {ikjter) (5) 

errorf {ik,iter) = {ik,iter) - 1 + f^{ik,iter) + 

+ /j (ikjter) sin(l0.;r /j {ikjter)) 

The log value of both errors is plotted in Fig. 4 with reference to a single individual. 



4 An Industrial Case Study 



4.1 The Device 

The shape optimization of a single-phase series reactor for power applications is 
considered [12]; the reactor is employed to reduce the peak value of short-circuit 
current and so to mitigate its electrodynamical effects. 





errorx errorf 
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Fig. 4. Flistory of approximation errors for the solution shown in Fig. 3 
The log value of both convergence indexes is plotted in Fig. 5. 




Fig. 5. Flistory of convergence indexes for the solution shown in Fig. 3 
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2 

Fig. 6. Cross-section of the reactor (one quarter) and design variables 

The reactor, the cross-section of which is shown in Fig. 6, is characterised by a 
coreless winding with cylindrical shape (foil winding); it is boxed in a laminated 
magnetic shield with rectangular shape in order to protect the surrounding 
environment from the strong stray field. The latter, in turn, gives rise to power losses 
in the winding that limit the operation of the device. The higher the winding, the 
lesser the stray field; on the other hand, the realization of a higher winding and shield, 
though reducing the effect of leakage, causes an increase of volume and cost of the 
reactor so that a conflict of design criteria is originated. For a prototype reactor rating 
5.9 MVA at a nominal current of 893 A the following values hold: h= 500 mm, 
dm=590 mm, a=210 mm, d=80 mm, t=40 mm, N=212, filling factor of the winding 
k = 0.504. 

4.2 Analysis 

The distribution of magnetostatic field in the reactor, for which the rectangular 
symmetry is assumed, is governed by the Poisson’s equation in terms of vector 
potential A=(0,0,A) 
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— graclA 

M 



= J 



(6) 



subject to boundary conditions A=0 along x=0 and elsewhere; J=3.57 Amm'^ is the 
current density in the winding while |i=l and ll,=10* are the values assumed for 
relative permeability of non-magnetic materials and iron, respectively. To solve (6) 
numerically, the two-dimensional field region shown in Fig. 6 has been discretized by 
means of a regular grid of finite elements, namely triangles with quadratic variation of 
potentials; the total number of elements is ne=95Q approximately. The evolutionary 
optimizer calls the MagNet code [13] for performing the field analysis and then 
updates the finite element grid at each iteration. 



4.3 Design 

In general, up to seven design variables defining the shape of the device can be 
considered: geometric height h, mean diameter d^, radial thickness of the winding a, 
number of turns N, axial distance d between winding and magnetic shield, thickness s 
of the shield, radial distance t between winding and shield. 

Two conflictual criteria can be defined: 

- the material cost fi of the reactor, namely the weighted sum of copper and iron 
weights, to be minimized: 



fi = 





( d^+ a \ 


(h , \ 


s 


— + t K + 5 




_ 


1 2 ; 


U )\ 



+ k^w^kjah 



(7) 



with kj=l, lq=3 while Wi=7860 kgm'^ and Wc=8930 kgm'^ are mass densities of iron 
and copper, respectively; 

- the fringing field f^ inside the winding, i.e. the mean radial component of magnetic 
induction in the cross-section of the winding, to be minimized as well: 



/ 2 - 



1 NW 



( 8 ) 



where NW=64 is the number of points of a grid sampling the radial induction in the 
winding. 

The minimisation of the fringing field has two important benefits: from a global 
point of view it leads to a strong reduction of additional losses in the winding and thus 
increases the efficiency of the reactor; on the other hand, the probability of local 
overheating inside the coil and its consequent failure is reduced. 

The following constraints have been prescribed: 

- the rated value of inductance L=23.57 mH; 

- the induction in the core, not exceeding 0.8 T, when the current is equal to 
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- the current density in the winding; 

- the insulation gaps d and t between winding and core. 

Consequently, three independent design variables have been selected, i.e. height h, 
mean diameter and number of turns N of the winding, respectively. Finally, a set of 
bounds preserves the geometrical congruency of the model, namely: 

0.5<h<l.5 m 0.1-f2a< <1.8 m 162<N<262 ( 9 ) 

The sensitivity surfaces of both f^ and f^ against (h,d_^) for given number of turns 
N=200 are reported in Fig. 7 and Fig. 8, respectively. 




Fig. 7. Average field in the winding as a function of mean diameter d_^ and height h of the 
winding itself 




dm=X(2) 



h=X(1) 



Fig. 8. Normalized cost of the reactor as a function of mean diameter d^ and height h of the winding 

The conflict between the two objectives is evident from the comparison of both 
surfaces. 



570 P. Di Barba, M. Farina, and A. Savini 



5 Results and Discussion 



Aiming at a preliminary investigation, the search space has been randomly sampled 
by means of 4000 points uniformly distributed; the approximation of the objective 
space shown in Fig. 9 has then been obtained (in the figure only 1000 samples are 
represented). 

3 
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Fig. 9. Objective space: random samples and Pareto solutions (*) 



From a practical point of view the optimal front appears to be globally convex, 
connected and composed of two parts, the one being deceptive and strongly Pareto, 
the other being non deceptive and weakly Pareto [14]. Moreover, a sub-region 
concentrating the majority of samples is evident; it corresponds to the weakly Pareto 
front. As a consequence, individuals during evolution are strongly attracted towards 
this sub-region. 

The EMO strategy has been run in two cases, each of which considering 10 
individuals whose initial values have been randomly selected; diversity of individuals 
in the objective space has been pursued in both cases. After overlapping the two sets 
of solutions, the approximation of the Pareto optimal front pointed out in Fig. 9 has 
been obtained. 
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Fig. 10. Shape of extreme solutions: minimum cost (left) and minimum stray configurations 

(dimensions in m) 

Minimum Cost Configuration 




Fig. 11. Fringing field in the winding for the minimum cost configuration 

In Fig. 10 the device geometries corresponding to the Pareto-optimal extreme 
solutions are shown; the corresponding distributions of stray field in the winding are 
reported in Fig. 11 and Fig. 12 respectively. The variability of both shape and 
performance is evident. 
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In order to estimate the maximum cost totcost of the EMO strategy implemented, 
the following formula holds 

tot cos t = niterx npopx nobjx femtime 

where: number of objective functions nobj=2 to 3, maximum number of iterations 
niter=300. In our experience of real-life problems, typical number of individuals is 
npop=5 to 20, while the cost of a single FEM analysis isfemtime=l to 5 min. 

As for the case study developed, due to the linear magnetostatic analysis and the 
inexpensive evaluation of/^ , we had femtime xtiobj=Q.2> min thus requiring some 48 
hours for the stopping criterion to be satisfied. 

Maximum Cost Configuration 




Fig. 12. Fringing field in the winding for the maximum cost configuration 



6 Conclusion 

In real-life engineering, when adopting an algorithm of multiobjective optimisation 
based on Pareto optimality, it is of primary importance to reduce the number of calls 
to the objective function, often requiring a FEM analysis. In the paper, a cost- 
effective EMO strategy has been developed and applied to the shape design of a 
realistic electromagnetic device. 

From the methodological viewpoint, the results show that a lowest-order evolution 
strategy algorithm with a small number of individuals (5 to 10) can be conveniently 
used as the engine of the multiobjective optimization. Nevertheless, the procedure of 
fitness assignment should be modified with respect to classical formulas. In fact, the 
latter refer to large number of individuals (50 to 100) and depend on some tuning 
parameters, usually defined by means of empirical formulas. In the paper a fitness 
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assignment procedure is proposed, making the formulas forcing diversity of 
individuals univocal and easy-to-implement. 

Turning to the case study, the cost and performance optimisation of a shielded 
reactor has been achieved. A wide number of configurations belonging to the Pareto 
optimal front have been identified, so offering the designer an effective choice among 
devices that rank from the best performing one to the less expensive one. From an 
industrial point of view, having a set of Pareto-optimal solutions makes it easy to 
fulfil a posteriori technology-related constraints that are typical of real-life 
engineering, whereas in scalar optimization they have to be carefully prescribed a 
priori in order the only solution be feasible. 

Finally, the proposed strategy allows the designer to pursue either shape or 
performance diversity of Pareto-optimal devices. 
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Abstract. In High Dose Rate (HDR) brachytherapy the conventional 
dose optimization algorithms consider the multiple objectives in form of 
an aggregate function which combines individual objectives into a single 
utility value. As a result, the optimization problem becomes single ob- 
jective, prior to optimization. Up to 300 parameters must be optimized 
satisfying objectives which are often competing. We use multiobjective 
dose optimization methods where the objectives are expressed in terms of 
quantities derived from dose- volume histograms or in terms of statistical 
parameters of dose distributions from a small number of sampling points. 
For the last approach we compare the optimization results of evolution- 
ary multiobjective algorithms with deterministic optimization methods. 
The deterministic algorithms are very efhcient and produce the best re- 
sults. The performance of the multiobjective evolutionary algorithms is 
improved if a small part of the population is initialized by deterministic 
algorithms. 



1 Introduction 

High dose rate brachytherapy is a treatment method for cancer where empty 
catheters are inserted within the tumor volume. Once the correct position of 
these catheters is verified, a single ^®^Ir source is moved inside the catheters at 
discrete positions (dwell positions) using a computer controlled machine. The 
problem that we consider is the determination of the n dwell times (which some- 
times are called as well dwell position weights or simply weights) for which the 
source is at rest and delivers radiation at each of the n dwell positions, resulting 
in a three-dimensional dose distribution which fulfills the defined quality crite- 
ria. In modern brachytherapy, the dose distribution has to be evaluated with 
respect to the irradiated normal tissues and the Planning Target Volume (PTV) 
which includes besides the Gross Tumor Volume (GTV) an additional margin 
accounting for position inaccuracies, patient movements, etc. Additionally, for 
all critical structures, either located within the PTV or in its immediate vicin- 
ity or otherwise within the body contour, the dose should be smaller than a 
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critical dose Dcrit- In practice it is difficult, if not impossible to meet all these 
objectives. Usually, the above mentioned objectives are mathematically quanti- 
fied separately, using different objective functions and then added together in 
various proportions to define the overall treatment objective function 

The number of source positions varies from 20 to 300. It is therefore a high di- 
mensional problem with competing objectives. The use of a single weighted sum 
leads to information loss and is not generally to be recommended, especially for 
non convex problems and for those cases where objectives have not the same 
dimensions and in addition maybe competing. An understanding of which ob- 
jectives are competing or non-competing is valuable information. We therefore 
use multiobjective evolutionary algorithms in HDR brachytherapy. One algo- 
rithm is based on the optimization of dose- volume histograms (DVH), which 
describes the distribution of the dose within an object, or from these derived 
distributions. These distributions are evaluated for the PTV, the surrounding 
tissue and organs at risk from a set of up to 100000 sampling points |2]. The 
calculation of the DVH requires a considerable amount of time and for implants 
with 300 sources the optimization requires a few hours. Another limitation of 
this method is that a comparison with deterministic algorithms is not possible. 
We have therefore considered the optimization of the dose distribution using as 
objectives the variance of the dose distribution on the PTV surface and within 
the PTV obtained from a set of 1500-4000 sampling points. These functions are 
convex and a unique global minimum exists. 

In the past comparisons of the effectiveness of evolutionary algorithms have 
been made with either other evolutionary algorithms 0 or with manually opti- 
mized plans Hl> Q We have compared the Pareto fronts obtained by multiob- 
jective evolutionary algorithms with the Pareto fronts obtained by a weighted 
sum approach using deterministic optimization methods such as quasi-Newton 
algorithms and Powells modified conjugate gradient algorithm which does not 
requires derivatives of the objective function [ 3 . 

2 Methods 

2.1 Calculation of the Dose Rate 

The dose rate around each of the small cylindrical shaped sources is dominated 
by the 1/r^ term with modifications due to absorption and scattering in the 
surrounding material. The dose value d{r) at r = (x,y,z) is: 



Ns 

d{x) = ^ WiK{r - Ti) (1) 

In (O Ti is the position of the source and Ng the total number of sources. 
K{r—ri) is the dosimetric kernel describing the dose rate per unit source strength 
at r from a source positioned at r^. The dwell position weight Wi = S^ti is 
proportional to the strength Sk of the of the single stepping source, where ti 
is the dwell time of the source dwell position [0|. Because of the high dose 
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gradients a dose specification at a single point inside the PTV is not possible 
in interstitial brachytherapy. For this reason we use as a reference dose Dref the 
average dose value at the PTV surface. 

2.2 Dose- Volume Histogram Based Optimization Using the 
Conformal Index 

In the paper of Baltas et al. | 7 | a conformal Index (COIN) was proposed as a 
measure of implant quality and dose specification in brachytherapy. This index 
takes into account patient anatomy, both of the tumor and normal tissues and 
organs, see Fig El 



BODY contour 




Fig. 1. Two-dimensional schematic diagram of the COIN = ciC2 based optimization. 
The coefficients Ci and C2 consider the coverage of the PTV by the isosurface with the 
prescription dose Dref and parts of the tissue surrounding the PTV. 



COIN is defined as: 



COIN = Cl • C2 (2) 

where Ci = Vj^.f’^/VpTv and C 2 = V^J^/Vref. The coefficient Ci is the 
fraction of the PTV, V^J^, that is enclosed by Dref and is a measure of how 
accurately the PTV is covered by Dref. The coefficient C 2 is the fraction of the 
volume of the reference dose, Vref, that is covered by PTV. It is also a measure 
of how much normal tissue outside the PTV is covered by Dref. COIN can be 
calculated from the cumulative DVHs of the PTV and the body at the reference 
dose Dref i-e. DVHpTv(Dref) and DVHbody(Dref) respectively: 



COIN = VpTv • DVHpTv (Dref )V 100 * ^body ‘ DVH5of/y(Dref) 



( 3 ) 
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Vbody, VpTV are the volumes of the body and the PTV, respectively. We 
describe the dependence of the conformal index COIN on the choice of the refer- 
ence dose value as the COIN distribution, see Fig. 13b). Usually the dose values 
are normalized to D^ef and are given either as fractions or percentages of D^ef. 
The ’’ideal” dose distribution is characterized by the following: 

— Cl = C 2 = 1 i.e. COIN=l at D = Dref, which means that the reference dose 
value isodose 3D envelope is identical with the PTV. 

— For D < Dref, an extremely rapid fall-off of the COIN value which corre- 
sponds to a rapid fall-off of the dose outside the PTV (normal tissues) . 

— COIN « 0 for D > Dref, that means that there are negligible volumes with 
dose values higher than D < Dref. 

The cumulative dose volume histograms of the PTV and the body for a rib 
implant is shown in Fig. Ha). Due the rapid decrease of the DVH of the body 
a large number of sampling points is necessary in order to calculate with a high 
accuracy the DVH, the COIN distribution and the COIN integral at dose values 
close to the reference dose value and above. The COIN distribution from the 
DVHs of Fig.EJa) and the COIN integral is shown in Fig. 0(b). 

2.3 Dose Statistics Based Optimization 

The DVH based optimization method requires a large number of sampling points 
for the computation of the histograms and the COIN distribution and therefore 
is computational expensive. We have developed a stratified sampling approach 
where the sampling points are non uniform distributed and which reduces the 
number of required sampling points by a factor of 5-10. Even then for implants 
with 200-300 sources the optimization time can reach 1-2 hours. A comparison 
of the performance with deterministic and gradient based algorithms is not prac- 
tical or not even possible. Therefore we consider another set of two objectives: 
For the conformity objective we use the variance fs of the dose distribution of 
sampling points uniformly distributed on the PTV. In order to avoid excessive 
high dose values inside the PTV we require a small as possible dose distribu- 
tion variance fv inside the PTV. Due to the source characteristics these two 
objectives are competing. We use normalized variances for the two objectives: 



Where m is the average dose value and N the corresponding number of 
sampling points. 

2.4 Multiobjective Optimization with Deterministic Algorithms 

These objectives allow us to use deterministic gradient based algorithms. We 
use a weighted sum approach for the multiobjective optimization, where for a 




( 4 ) 
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Fig. 2. (a)Dose-volume histograms of the PTV and the body as a function of dose. 
(b)The corresponding COIN distribution. The shaded area to the right of D/Dref = 
1.5 is the COIN integral. The objectives are maximum COIN value at D = Dref and 
minimum COIN integral for the avoidance of high dose values in the PTV and the 
surrounding tissue. 
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set of weights for the volume and surface variance we perform a single objective 
optimization of f^: 



where ws^wy > 0 are the surface and volume importance factors, respec- 
tively and ws + wy = 1- We used 21 optimization runs where ws varied from 
0 to 1 in steps of 0.05 to determine the shape of the trade-off curve. A problem 
in using deterministic optimization methods is that the solution contains a large 
number of dwell weights with negative values. This is a non physical solution. 
In the past either constrained optimization methods were used or a correction 
was applied by setting to 0 all negative weights in each optimization step. A 
constrained optimization method increases the number of parameters by a fac- 
tor of two. The correction method for the negative weights reduces the quality 
of the optimization results. We use a simple technique by replacing the decision 
variables, the weights Wk, with the parameters = Wf, . Using this mapping 
technique we avoid non feasible solutions. For this unconstrained optimization 
we use the Polak-Ribiere variant of Fletcher-Reeves algorithm or the Broyden- 
Fletcher-Goldfarb-Shanno quasi-Newton based algorithm pj. These require the 
first derivative of the objective function with respect to the decision variables to 
be calculated. The derivative of the normalized variance / used by the gradient 
based optimization methods is: 



As a gradient free method we used the modified Powell method of Numerical 
Recipes j^]. 

2.5 Multiobjective Optimization with Evolutionary Algorithms 

The population of our multiobjective evolutionary algorithm consists of strings 
storing a set of weights for each source dwell position. The weights are initially 
produced randomly distributed in the interval [0, 1]. A part of the population 
can be initialized, if this is possible, by solutions of deterministic algorithms. 

Three selection mechanism can be used. The niched Pareto algorithm 
(NPGA) proposed by Horn and Nafpliotis 0, the strength evolutionary ap- 
proach algorithm (SPEA) by Zitzler and Thiele 0 and the non dominated 
ranking algorithm (NRGA) by Fonseca and Fleming MM- 

After a new population is formed, the strings of randomly selected pairs 
undergo a crossover operation with a probability Pc and mutation with a prob- 
ability Pjn- We have found that Pc must be larger than 0.7 and Pm should be 
smaller than 0.1. The size of the population should be larger than 50. Various 
crossover types can be selected such as single point, two point, and arithmetic 
crossover. For the mutation operation also we have used various forms: uniform 



fw = wsfs + Wyfy 



( 5 ) 




N 



( 6 ) 
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or non-uniform mutation. We use a real representation for the gene values. A 
detailed description of the genetic operators is given in reference m 

For the NPGA algorithm we use a tournament selection, the tournament 
population size is a free parameter and can be used for the modification of the 
selective pressure. Tests have shown that it should be normally 10% of the popu- 
lation size. For much smaller values the genetic algorithm is sensitive to fluctua- 
tions, while much larger values can lead to a premature convergence. We applied 
special genetic operators for decision variables as described by Michalewicz m 
Some of them offer the possibility for a better performance of the genetic algo- 
rithms in the late stage of the optimization process. For NPGA we use a sharing 
mechanism described by S. Deb d The sharing parameter a share is given by: 



^ share 



0.5 



(7) 



where q is the desired number of distinct Pareto-optimal solutions and P is 
the number of variables in the problem. 



Selecting the Solution from the Pareto Set. After the last generation 
is processed by the SPEA, NRGA or NPGA algorithm, members of the pop- 
ulation are expected to be close to the Pareto frontier. A member of the non 
dominated set is selected which has a minimum Euclidean distance to the ideal 
optimum. The ideal point is defined by the minimum values (/™™, /™“) of each 
objective function. The distance is calculated by normalizing each objective to 
a maximum value of 1 using the corresponding largest objective value found in 
the population. This member is presented as the solution of the optimization 
process. Additionally members are selected each with the best result in each 
objective. A list is produced with the objective values for all the members of 
the Pareto set. Additionally the user can examine the dose distributions and the 
dose- volume histogram and isodose contours of every member of the population. 
Based on this information of the trade-off surface of the various objectives a 
decision maker can select the best result. In our current implementation each 
objective has equal priority. 



3 Results 

The dose variances are calculated from 1000-4000 quasi-randomly distributed 
sampling points. For the GOIN based optimization « 100000 points are gener- 
ated. The distances of these points to each source dwell position r, more precisely 
the inverse square distances 1/r^, are stored for speed maximization in look- 
up tables. We assume a invariant kernel K{r) = 1/r^ and ignore any spatial 
anisotropy, namely attenuation and scattering effect. This dosimetric simplifica- 
tion has no measurable influence on the results of the optimization. 

All calculations presented in our study have been made by using for the 
mutation probability Pm a value of 0.0065 and for the crossover probability Pc 
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a value of 0.85. Furthermore a uniform mutation option has been selected and a 
two point crossover has been used. The selection of a two point crossover means 
that the string representation of a member is cut at two random positions and the 
two end parts are interchanged. This increases the efficiency of the exploitation 

n^. 

The optimization time depends mainly on the number of dwell positions and 
the population size. For 200 dwell positions and up to 200 generations it can 
take 1 hour with an Intel Pentium III 700 MHz processor with 512 MB RAM. 

The flowchart for the COIN based optimization algorithm is shown in Fig. 0 
For each member of the population for a given generation a renormalization is 
carried out according to the resulting COIN distribution, so that the maximum 
COIN value is observed at D = D^ef 0. The dose prescription is realized at 
the Dref, the isodose value resulting in the maximal conformity. This results 
generally in mean normalized dose values at the surface of PTV different from 
1 . 0 . 

The multiobjective genetic algorithm, which uses dose-volume based con- 
straints, produces equivalent or even better results than algorithms which were 
based on phenomenological methods and used in the majority of treatment plan- 
ning systems US], PZZI, PS- 
As an example in Fig. 0 the multiobjective genetic algorithm provides a 
solution with a more homogeneous dose distribution inside the PTV than by 
conventional optimization algorithms of a treatment planning system. Due to 
the large computational time for the COIN based optimization we used only the 
NPGA algorithm. 

For the variance based objectives we used 22 different implant cases from 
various anatomic regions. For these implants different number of catheters were 
used and their topology differed from case to case. The study aimed to assess 
the dose homogeneity and conformity and to determine if a common set of 
importance factors exists, allowing a single objective function to be used with 
these weights. 

An example of the geometry of a PTV is shown in Fig0a) including the 
catheters, the source dwell positions and the sampling points on the PTV surface 
which define the surface variance. In Fig0b)the isosurface for the prescription 
dose is shown, which should have the same shape as the PTV. 

The deterministic gradient based algorithms are very effective in generating 
the Pareto front using a summed weights approach. Powells algorithm which 
does not require derivatives is efficient only for implants with a small number of 
sources. For implants with 250-300 sources the optimization time can reach a few 
hours for a single objective run, whereas the gradient based algorithms require 
only 1-2 minutes. Gradient based algorithms are limited by the fact that they 
can be trapped in local minima, or that non convex regions are not accessible 
using the weighted sum method m- 

From the evolutionary algorithms SPEA has been found to produce the best 
results, since it applies an elitism and sharing mechanism. Therefore the Pareto 
fronts are more uniformly distributed as compared with NPGA. For implants 
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Fig. 3. Flow diagram for the dose-volume histogram based multiobjective genetic 
algorithm. 



with a small number of sources SPEA generated solutions close to the Pareto 
sets found by the deterministic algorithms. For implants with many sources 
the genetic algorithms used converge in some cases to a Pareto set which was 
far away from the true Pareto set. Such an example for an implant with 215 
source dwell positions is shown in Fig0 The SPEA algorithm converges after 
200 generations to a Pareto front which is very small and far from the Pareto 
set generated by the gradient based algorithms. The optimization path is shown 
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Fig. 4. Comparison of the COIN distributions for a breast implant from the multiob- 
jective genetic algorithm and four conventional single objective algorithms. 




Fig. 5. a) Contours of a rib implant with the catheters and the source dwell positions. 
On the PTV surface sampling points are shown at which the dose is calculated b) the 
dose isosurface obtained from the dose optimization. 
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for a set of importance factors fv,fs for the Polak-Ribiere algorithm. After 10 
iterations a point on the Pareto front is reached. 

Using random sets of decision variables we have found for this example that 
the number of function evaluations required by a random search method to 
obtain points on the Pareto front is larger than 10^° H2|. A random search 
would require 10^® times more function evaluations to generate points on the 
Pareto set found by the SPEA algorithm without initialization. Even with this 
performance the SPEA algorithm is not able to produce points on the Pareto 
front found by the deterministic methods. Using a few members initialized by the 
gradient based algorithm the multiobjective evolutionary algorithms, especially 
SPEA reproduced the Pareto fronts obtained by the deterministic algorithms, 
see FigEl For a more detailed comparison of the deterministic and evolutionary 
algorithms see reference na. 
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Fig. 6. Pareto front obtained by the gradient based algorithm and with the SPEA 
algorithm with and without initialization. 



Fig.Q shows the Pareto fronts for the 22 implants. For some implants a 
improvement in dose homogeneity is possible without reducing the COIN value 
which is correlated with the surface variance, while for some implants there 
is a strong trade off between these two objectives. Therefore a multiobjective 
optimization is essential for the dose optimization problem in brachytherapy. 
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Fig. 7. Pareto fronts obtained by the gradient based algorithm for 22 implants. The 
variety shows that a single objective optimization with constant importance factors 
does not give always a good result. 



4 Conclusions 

We used for the first time multiobjective evolutionary anatomy based dose opti- 
mization algorithms in HDR brachytherapy cni. For the COIN-based objectives 
we have found that multiobjective evolutionary algorithms produced solutions 
which are better than by conventional algorithms in treatment planning systems 
which use deterministic algorithms and catheter-oriented objectives. They also 
have the problem with infeasible negative weights which they avoid by a repair 
mechanism or by using special constraints to the objective functions in order to 
reduce their numbers and the degree of the violation. 

The results of various algorithms for the variance based objectives have been 
compared using a representative set of 22 implants encountered in clinical prac- 
tice. We have limited our study to cases where no critical structures are consid- 
ered. Trade-off surfaces which reveal the nature of the multiobjective problem of 
the dose optimization in brachytherapy have been obtained. Due to the variety of 
the trade-off surfaces found, which depends on the implant and complex catheter 
geometry, no common set of optimal importance factors exists. Therefore it is 
useful to determine the Pareto front and then to select a solution according to its 
characteristics. Pareto sets have been obtained by a deterministic unconstrained 
optimization method using a simple mapping technique which transforms the lin- 
ear into a quadratic optimization problem and removes infeasible solutions with 
negative dwell position weights. The gradient based algorithms, if they can be 
used, are very effective because they converge very fast and generate the Pareto 
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fronts which in most cases are much better than the Pareto front obtained by 
evolutionary multiobjective algorithms. 

If the number of objectives increases then the number of combinations using 
a weighted sum approach with deterministic algorithms increases. Deterministic 
methods are not efficient for non analytic complex objectives such as used by 
the COIN based method. When more objectives are included then a non convex 
feasible space could be the result m- A combination of deterministic and evo- 
lutionary multiobjective algorithms seems to be the best choice for a robust and 
efficient multiobjective dose optimization in HDR brachytherapy. The targets 
of the dose optimization cannot be expressed uniquely by a single set of objec- 
tive functions. This is because conformity and homogeneity can be expressed 
with various functional forms and for the complex geometry of the PTV and 
the variety of topological configurations it is not known which set is the best. 
Is the COIN based dose optimization approach better than the dose-statistics 
approach using variances and if yes how much better? 

We are currently studying for various sets of objectives the Pareto fronts 
using multiobjective evolutionary algorithms and if possible in combination with 
deterministic algorithms. We expect to understand their limitations and their 
robustness and performance for the complex problem of the dose-optimization 
in brachytherapy. 
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Abstract. We formulate linguistic rule extraction as a three-objective 
combinatorial optimization problem. Three objectives are to maximize the 
performance of an extracted rule set, to minimize the number of extracted rules, 
and to minimize the total length of extracted rules. The second and third 
objectives are related to comprehensibility of the extracted rule set. We describe 
and compare two genetic-algorithm-based approaches for finding non- 
dominated rule sets with respect to the three objectives of our linguistic rule 
extraction problem. One approach is rule selection where a small number of 
linguistic rules are selected from prespecified candidate rules. The other is 
genetics-based machine learning where rule sets are evolved by generating new 
rules from existing ones using genetic operations. 



1 Introduction 

As multi-layer feedforward neural networks, fuzzy rule-based systems are universal 
approximators of nonlinear functions [20, 29]. Both of them have many application 
fields such as control, forecast, modeling and classification. One advantage of fuzzy 
rule-based systems over neural networks is the transparency of input-output relations 
realized by fuzzy rules. While neural networks are handled as a black-box, fuzzy rules 
are usually written in a linguistically interpretable form such as “If Xj is large and 

^2 is small then y is large” and “If Xj is medium and X 2 is large then Class 2.” We 

refer to linguistically interpretable fuzzy rules as linguistic rules. Approximation 
ability and interpretability make fuzzy rule-based systems a practically useful 
modeling tool. 

Fuzzy rules are obtained from domain experts as linguistic knowledge. Since 
linguistic knowledge is incomplete and inaccurate, automated rule generation 
methods from numerical data have been proposed [27, 30]. Automated tuning 
methods of fuzzy rule-based systems have been also proposed for improving their 



E. Zitzler et al. (Eds.): EMO 2001, LNCS 1993, pp. 588-602, 2001. 
© Springer-Verlag Berlin Heidelberg 2001 




Multiobjective Optimization in Linguistic Rule Extraction from Numerical Data 589 



performance [1, 3, 5, 16, 19]. Some methods [3, 19] used genetic algorithms, and 
others [1, 5, 16] were similar to the learning of neural networks. See [21] for various 
techniques of fuzzy rule-based system design. Many of these rule generation and 
tuning methods tried to maximize the performance of fuzzy rule-based systems. 

Recently, some researchers [17, 18, 22-26, 31, 32] tried to improve 
interpretability of fuzzy rule-based systems. For example, interpretability of 
membership functions was discussed in [22, 23]. The number of fuzzy rules was 
decreased in [24-26, 31]. Jin [17] pointed out the following four factors closely 
related to interpretability of fuzzy rule-based systems. 

(a) Distinguishability of a fuzzy partition. Membership functions should be clearly 
distinguishable from each other so that a linguistic term can be assigned to each 
membership function. 

(b) Consistency of fuzzy rules. Fuzzy rules in a fuzzy rule-based system should not 
be strongly contradictory to each other. 

(c) The number of fuzzy rules. It is easy to examine a small number of fuzzy rules 
while the examination of many rules is a cumbersome task. 

(d) The number of conditions in the antecedent part (i.e., if-part). It is not easy to 
understand a fuzzy rule with many antecedent conditions. 

Among these four factors, distinguishability was included in a cost function in 
regularized learning of Jin [17]. 

In this paper, we formulate linguistic rule extraction as a combinatorial 
optimization problem with three objectives: to maximize the performance of an 
extracted rule set, to minimize the number of extracted rules, and to minimize the 
total length of extracted rules. That is, the last two factors (c) and (d) are considered. 
Since we use prespecified linguistic terms with fixed membership functions, we do 
not have to consider the distinguishability of a fuzzy partition. Inconsistency of fuzzy 
rules is resolved by assigning a certainty grade to each linguistic rule. For an n- 
dimensional pattern classification problem, we try to extract linguistic rules of the 
following form: 

Rule Rj'. If X] is Aj^ and ... and x„ is Aj„ then Class Cj with CFj, (1) 

where Rj is the label of the y-th linguistic rule, x = (xi,...,x„) is an n-dimensional 
pattern vector, Aji is a linguistic value (e.g., small and large) for the i-th attribute, 
C j is a consequent class, and CRj is a certainty grade in the unit interval [0, 1]. 

For finding non-dominated rule sets of our three-objective linguistic rule 
selection problem for pattern classification, we examine two schemes: a rule selection 
method by genetic algorithms and a genetics-based machine learning (GBML) 
method. These two schemes are compared with each other through computer 
simulations on a high-dimensional pattern classification problem. We also discuss 
linguistic rule extraction from numerical data for function approximation, which is 
formulated as a three-objective problem as in the case of pattern classification. 
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2 Formulation of Linguistic Rule Extraction 

In this section, we formulate linguistic rule extraction from numerical data as a three- 
objective combinatorial optimization problem for pattern classification. Our basic 
idea is to simultaneously maximize classification ability and interpretability of 
extracted rule sets. 



2.1 Assumptions 

We assume that m training patterns (i.e., labeled patterns) are given as numerical data 
for an n-dimensional c-class pattern classification problem. We denote those training 
patterns as x = , p = l,2,...,m. For simplicity of explanation, each 

attribute value Xpi is assumed to be a real number in the unit interval [0, 1], i.e., 

Xpi e [0, 1]. This means that the pattern space of our pattern classification problem is 

the n-dimensional unit hypercube [0, 1]” . In computer simulations of this paper, all 
attribute values are normalized into real numbers in the unit interval [0, 1]. 

We also assume that a set of linguistic values is given for describing each 

attribute. That is, we assume that a fuzzy partition of the pattern space [0, 1]” is 

given. For simplicity of explanation, we use five linguistic values in Fig. 1 for all the 
« attributes. Of course, our approaches described in this paper are applicable to more 
general cases where a different set of linguistic values is given to each attribute. In 
such a general case, membership functions are not necessary to be triangular. They 
are specified according to domain knowledge and intuition of human experts. 



Membership 




Fig. 1. Membership functions of five linguistic values (S: small, MS: medium small, M: 
medium, ML: medium large, and L: large). 

As Jin [17] pointed out, it is not easy to understand fuzzy rules with many 
antecedent conditions. Thus we use “DC: don’t care” as an additional linguistic value 
in the antecedent part of our linguistic rule in (1). Linguistic rules with many “don’t 
care” conditions can be concisely written even for high-dimensional pattern 
classification problems. The following are some examples of such linguistic rules for 
a 13-dimensional wine classification problem used in Section 5. 
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/?] : If X-] is medium and is medium then Class 1 with 0.56. (2) 

/?2 : If X[Q is small then Class 2 with 0.94. (3) 

/?3 : If Xj is small then Class 3 with 0.85. (4) 



These linguistic rules are very simple and easily understood. In the above linguistic 
rules, “don ’t care” conditions are omitted. 

When we use the five linguistic values in Fig. 1 and “DC: don’t care” as 

antecedent linguistic values in our linguistic rule in (1), we have (5 + 1)” 
combinations of antecedent linguistic values as follows: 



Rule R; : If Xj is 
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ML 




ML 
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then Class C : with CF : . (5) 



As shown in Appendix, the consequent class C j and the certainty grade CFj of 

each linguistic rule can be easily specified from training patterns when its antecedent 
conditions are given. Thus our linguistic rule extraction problem can be viewed as 
finding a small number of combinations of antecedent linguistic values among the 

above (5 + 1)” combinations. The total number of possible rule sets is 2^ where 

A = (5 + 1)”. It is not easy to examine all the possible rule sets even for a three- 
dimensional pattern classification problem. In this case, the total number of possible 
rule sets is s 1.05x10®^ . In the case of high-dimensional problems, we can 

examine only a tiny portion of possible rule sets because the search space is huge. 



2.2 Three-Objective Combinatorial Optimization Problem 

Let be the set of (5 + 1)” linguistic rules for our n-dimensional pattern 

classification problem. They correspond to the (5 + 1)” combinations of the five 
linguistic values in Fig. 1 and “don’t care”. Our linguistic rule extraction problem is 
to find a small number of simple linguistic rules with high classification ability from 
the rule set i-e., to find a compact and high-performance rule set. Because 

there is a tradeoff between compactness and performance, we try to find non- 
dominated rule sets with respect to conflicting criteria. 

We measure the classification performance of a rule set 5 {S a ^^ll) by the 
number of correctly classified training patterns. Compactness of S is measured by two 
criteria: the number of linguistic rules in S and the total number of antecedent 
conditions in S. Of course, “don’t care” conditions are not counted among antecedent 
conditions. The number of antecedent conditions in a linguistic rule is referred to as 
its length. Thus the total number of antecedent conditions is the same as the total 
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length of linguistic rules. Based on these discussions, our linguistic rule extraction 
problem is formulated as follows: 

Maximize /j (5) , minimize / 2 (5) , and minimize (6) 

where 

f\(S) : The number of correctly classified training patterns by S, 
f 2 (S) : The number of linguistic rules in S, 
f^iS) '■ The total length of linguistic rules in S. 

For example, when a rule set S consists of the three linguistic rules , R 2 and 
in (2)-(4), f 2 (S) and f^iS) are calculated as f 2 (S) = 3 and /3(5') = 4, 
respectively. The first objective f\(S) is calculated by classifying all the m training 
patterns by the rule set S. In Appendix, we show how each pattern is classified by 
linguistic rules (see [2] for various fuzzy reasoning methods for pattern 
classification). 

The third objective f^iS) is not the average length of extracted rules but the 
total length. Let f 2 *(S) be the average length. For example, f 2 *{S) is calculated as 
f 2 *(S) = 1.33 for the rule set S with Ry , R 2 and R^ . Let us construct another rule set 
by adding a linguistic rule R/^ of the length one to S. For the new rule set 
with Ri~ R^, fyiS'*') is calculated as = 1.25. That is, the average length is 

improved by adding R/^ to the rule set S while the complexity of the rule set is 
increased. Even if the added linguistic rule R/^, does not improve the classification 

performance of the rule set (i.e., /j(5') = ), the new rule set 5’"'' is not 
dominated by the rule set S when we use the average length as the third objective. 
This simple example shows that the average length is not an appropriate objective for 
measuring the simplicity of extracted linguistic rules in the framework of multi- 
objective optimization. Thus we use the total length as the third objective f^iS) . In 

[7], the average length was used for rule selection. Since three objectives were 
combined into a scalar fitness function in [7] for obtaining a single optimal rule set, 
the above-mentioned difficulty of the average length can be ignored. The difficulty of 
the average length is crucial only when this objective is used in the framework of 
multi-objective optimization for obtaining non-dominated rule sets. 



2.3 Simple Numerical Example 

Let us consider a simple numerical example in Fig. 2 (a) where 121 training patterns 
are given in the unit square [0, 1] X [0, 1] . If we use a standard grid-type fuzzy 

partition in Fig. 2 (b), we can generate 5x5 linguistic rules with no “don’t care” 
conditions. All the given training patterns are correctly classified by those 25 
linguistic rules. On the other hand, a much simpler rule set can be extracted if we 
consider linguistic rules with “don’t care” conditions. 

As shown in Subsection 2.1, the total number of possible combinations of 
antecedent linguistic values is 36 for the two-dimensional pattern classification 
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problem when we use “don’t care” as an additional linguistic value. By examining 
subsets of those 36 linguistic rules, we can find that the following three linguistic 
rules correctly classify all the given training patterns: 

: If X[ is medium then Class 2 with 0.75. (7) 

Rg : If X 2 is large then Class 2 with 0.84 . (8) 

R (2 : (X], X 2 ) is Class 1 with 0.19. (9) 

The last linguistic rule R(- has no linguistic condition (i.e., it has two “don’t care” 
conditions). Since R(j has a small certainty grade (i.e., 0.19), this rule is used for 
pattern classification only when the other two linguistic rules and R^ are not 
applicable. In this manner, inconsistency is resolved through certainty grades. 

All the non-dominated rule sets of our linguistic rule extraction problem for this 
numerical example are shown in Table 1. Since the numerical example in Fig. 2 is 
very simple, we can find all non-dominated solutions by examining all rule sets with 
one, two or three linguistic rules. Usually we can not find all non-dominated solutions 
by such an enumeration method especially for high-dimensional pattern classification 
problems. In the rest of this paper, we explain how genetic algorithms can he used for 
finding non-dominated solutions of our three-objective linguistic rule extraction 
problem. 



I: Class 1 O: Class 2 





(a) 



(b) 



Fig. 2. A numerical example and a grid-type fuzzy partition. 



Table 1. All the non-dominated solutions for the numerical example in Fig. 1. 
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3 Rule Selection 



3.1 Basic Idea of Rule Selection 

We have already proposed a GA-based rule selection method [7, 15] where multiple 
objectives were combined into a scalar fitness function for applying standard single- 
objective genetic algorithms. We have also proposed two-objective genetic algorithms 
for finding non-dominated rnle sets with respect to the classification performance and 
the number of linguistic rules [8]. In this paper, the total rule length is added to the 
two-objective rule selection method in [8]. 

It is not difficult to extend our former two-objective rule selection method to the 
case of three objectives. Let N be the number of linguistic rules that can be generated 
from given training patterns. Those N lingnistic rnles are used as candidate rules in 

rule selection. In low-dimensional pattern classification problems, all the (5-1-1)” 
linguistic rules in (5) are considered as candidate rules. All linguistic rules, however, 
are not always generated (e.g., when training patterns are not evenly distributed in the 
entire pattern space). In high-dimensional pattern classification problems, the number 
of candidate rules should be much smaller than (5 + 1)” because the string length is 
the same as the number of candidate rules in our rule selection method. 

Any subset 5 of A candidate rules can be represented by a binary string of the 
length A as S = s^S 2 ■ ■ ■ Sj^ . The inclusion and exclusion of the y-th candidate rule are 
represented by s j = 1 and sj = 0 , respectively. Since every feasible solution of our 
problem is represented by a binary string, we can use various multiobjective genetic 
algorithms [28, 33, 34] for finding its non-dominated solutions. 



3.2 Candidate Rule Prescreening 

In high-dimensional pattern classification problems, we can not handle all the 
(5 + 1)” linguistic rules in (5) as candidate rules. Thus a prescreening procedure of 
candidate rnles is necessary for applying our rule selection method to high- 
dimensional problems. A simple trick is to examine only short lingnistic rules with a 
few antecedent lingnistic conditions [7]. This trick, which has also a good effect on 
the third objective of our rule extraction problem, can significantly decrease the 
number of candidate rules. Even when the total number of linguistic rules is huge, the 
number of short rules is not so large. For example, the number of linguistic rules of 
the length one with a single antecedent condition is calcnlated as ] 3 Cj x5 = 65 for a 
13-dimensional problem snch as wine data. In our computer simulation on the wine 
data, we examine one rule of the length zero, 65 rules of the length one, and 
] 3 C 2 x5x5 = 1950 rules of the length two for generating candidate rules in our rule 
selection method. In [7], the certainty grade of each linguistic rule was also used in 
addition to the length for prescreening candidate rules. 
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3.3 Domain-Specific Heuristic Procedures 

The performance of genetic algorithms for rule selection can be improved by 
incorporating domain- specific heuristic procedures. One procedure is for eliminating 
unnecessary rules. As shown in Appendix, we use a fuzzy reasoning method based on 
a single winner rule in the classification phase. This means that each pattern is 
classified by a single winner rule in a rule set. If a linguistic rule is not used as a 
winner rule for any training pattern, that rule can be removed from the rule set with no 
deterioration of its classification performance. This elimination improves the second 
and third objectives of our linguistic rule extraction problem. Our rule elimination 
procedure removes such a linguistic rule before each rule set is evaluated in three- 
objective genetic algorithms. 

Another trick is to bias the mutation probability. Even when we use an 
appropriate prescreening procedure, usually the number of selected rules is much 
smaller than that of candidate rules. That is, binary strings should consist of a small 
number of Ts and a large number of O’s. The standard mutation tends to increase the 
number of I’s when binary strings have much more O’s than I’s. For efficiently 
searching for binary strings with a small number of Ts, we use biased mutation 
probabilities where the mutation probability from 1 to 0 is much larger than that from 
Oto 1. 



4 Genetics-Based Machine Learning 



4.1 Basic Idea of Genetics-Based Machine Learning 

The quality of non-dominated rule sets obtained by rule selection strongly depends on 
the choice of a prescreening procedure. While some studies [4, 12] showed high 
performance of short rules with only a few antecedent conditions, this is not always 
the case. Some pattern classification problems may need long rules as well as short 

rules. In this case, the search among (5-1-1)” linguistic rules is necessary for finding 
good rule sets. Genetics-based machine learning (GBML) algorithms are promising 
tools for finding non-dominated rule sets in the huge search space. 

We have already proposed Michigan- style GBML algorithms for generating 
linguistic rules for high-dimensional pattern classification problems [9, 14]. In our 
Michigan- style algorithm, a single linguistic rule was coded as a string. A population 
with a fixed number of linguistic rules was evolved by genetic operations for finding 
good linguistic rules. Since our objectives in this paper are not only classification 
performance but also the number of linguistic rules and the total rule length, the 
number of linguistic rules should not be fixed. It is, however, difficult to directly 
optimize the number of linguistic rules in the framework of Michigan-style algorithms 
because a fitness value is assigned to each linguistic rule. Thus we use a Pittsburgh- 
style algorithm with variable sting length for finding non-dominated rule sets. 
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4.2 Genetic Operations 

As in our Michigan- style algorithms in [9, 14], each linguistic rule is coded by its 
antecedent linguistic values as “Ay]Ay 2 ^jn ”■ Since the consequent class and the 

certainty grade are easily specified by training patterns, they are not coded as a part of 
the string. A rule set is denoted by a concatenated string. Each substring of the 
concatenated string corresponds to a single linguistic rule. 

A new rule set (i.e., a new string) is generated by crossover and mutation. In our 
computer simulation, we use a kind of one-point crossover shown in Fig. 3, which 
changes the number of linguistic rules in each rule set. This crossover operation 
randomly selects a different cutoff point for each parent to form an offspring. A 
mutation operation randomly replaces each element (i.e., each antecedent linguistic 
value) of the string with another linguistic value. Elimination of existing rules and 
addition of new rules can be also used as mutation operations. Such mutation 
operations change the number of linguistic rules in each string. 



Parent 1 
Parent 2 



I 



Rl 


R2 R3 


R4 


R5 


R6 


1 


1 


RaRb RC RdRe 


RF 



Child 






Rl 


R2 


RA 


Rb 


RC or R 3 


RA 


R5 


R6 


Rd 


Re 


RE 



Fig. 3 Crossover operation. 



4.3 Hybridization with Michigan-Style Algorithm 

The search ability of Pittsburgh-style algorithms to find good linguistic rules is 
somewhat inferior to that of Michigan-style algorithms when they are applied to high- 
dimensional pattern classification problems [11, 13]. Michigan-style algorithms, 
however, can not directly optimize rule sets because a fitness value is assigned to each 
linguistic rule (not to each rule set). On the other hand, Pittsburgh approach can 
directly optimize rule sets. A natural idea for utilizing advantages of these two kinds 
of GBML approaches is to combine them into a single hybrid algorithm [10, 11]. A 
Michigan-style algorithm, which is used as a mutation operation in our Pittsburgh- 
style algorithm, partially modifies each rule set by generating new rules. By this 
hybridization, search ability of our Pittsburgh-style algorithm to efficiently find good 
linguistic rules is significantly improved. In our computer simulation, a single 
iteration of a Michigan-style algorithm was applied with a prespecified probability to 
every rule set generated by the genetic operations of our Pittsburgh-style algorithm. 
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5 Computer Simulations 



5.1 Test Problem and Simulation Conditions 

We applied the rule selection method and the hybrid GBML algorithm to wine data 
(available from UC Irvine Database: http://kdd.ics.uci.edu/) for finding non- 
dominated rule sets of our linguistic rule extraction problem. These two algorithms 
were implemented in the framework of three-objective genetic algorithms. The three 
objectives were combined into the following fitness function, which was used in a 
roulette wheel selection with a linear scaling. 

fitness(S) = wi • f^iS) - W 2 ■ f 2 iS) - ■ fsiS) . (10) 

For finding various non-dominated solutions, weights W] , W 2 and W 3 were not 
fixed but randomly updated whenever a pair of parent strings were selected as in our 
two-objective genetic algorithm in [ 8 ]. Non-dominated solutions were separately 
stored from the current population. Some of the stored non-dominated solutions were 
added to the current population for maintaining its diversity and quality. 

Both the rule selection method and the hybrid GBML algorithm were executed 
under the following parameter specifications: 

Population size: 50 rule sets. 

Stopping conditions: 1000 generations. 

That is, a population of 50 rule sets was evolved until the 1000th generation in both 
algorithms. We applied each algorithm to wine data 20 times. In the following 
subsections, we report non-dominated solutions obtained from those 20 trials. 



5.2 Simulation Results by Rule Selection 

Since wine data involve 13 attributes, the number of possible combinations of 
antecedent linguistic values is (5 -f 1)^^ s 1.3x10^*^ (i.e., about 13 billion). It is 
impossible to apply our rule selection method to candidate rules generated from all 
those combinations. We examined only short linguistic rules of the length two or less. 
By this prescreening procedure, 1834 linguistic rules were generated as candidate 
rules. Thus each rule set was represented by a binary string of the length 1834. The 
task of our rule selection method is to find non-dominated rule sets from those 
candidate rules. From 20 trials, we obtained 17 non-dominated rule sets. Due to the 
space limitation, we show 12 rule sets with high classification rates. Too small rule 
sets are not shown in this table (e.g., those with only two rules). From Table 2, we can 
see that our rule selection method found various non-dominated rule sets. Some are 
very compacts, and others have high classification rates. We can observe a tradeoff 
between the classification performance and the compactness of rule sets. 
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Table 2. Non-dominated rule sets obtained by the GA-based rule selection method. 



# of rules 


3 


3 


3 


4 


4 


4 


5 


5 


5 


6 


7 


9 


Total length 


3 


4 


5 


4 


5 


6 


5 


6 


7 


9 


11 


17 


Average length 


1.00 


1.33 


1.67 


1.00 


1.25 


1.50 


1.00 


1.20 


1.40 


1.50 


1.57 


1.89 


# of patterns 


161 


163 


164 


169 


170 


171 


172 


173 


175 


176 


177 


178 


Classification rate 


90.4 


91.6 


92.1 


94.9 


95.5 


96.1 


96.6 


97.2 


98.3 


98.9 


99.4 


100 



5.3 Simulation Results by Genetics-Based Machine Learning 



The search space in the hybrid GBML algorithm for wine data consists of (5 + 1)'^ 

linguistic rules. This is much larger than that of the rule selection method where non- 
dominated rule sets were selected from 1834 candidate rules. From 20 trials, we 
obtained 18 non-dominated rule sets. Twelve rule sets with high classification rates 
are shown in Table 3. From the comparison between Table 2 and Table 3, we can see 
that similar results were obtained by the two algorithms while the search space in the 
hybrid GBML algorithm was terribly large. Some non-dominate rule sets obtained by 
the hybrid GBML algorithm include linguistic rules of the length three, which were 
not considered in the rule selection method. 



Table 3. Non-dominated mle sets obtained by the hybrid GBML algorithm. 



# of rules 


3 


3 


4 


4 


4 


5 


5 


5 


6 


6 


7 


8 


Total length 


4 


5 


4 


5 


6 


5 


6 


7 


6 


10 


8 


15 


Average length 


1.33 


1.67 


1.00 


1.25 


1.50 


1.00 


1.20 


1.40 


1.00 


1.67 


1.14 


1.88 


# of patterns 


163 


165 


169 


170 


171 


172 


173 


174 


174 


175 


176 


177 


Classification rate 


91.6 


92.7 


94.9 


95.5 


96.1 


96.6 


97.2 


97.8 


97.8 


98.3 


98.9 


99.4 



6 Linguistic Rule Extraction for Function Approximation 

Our three-objective linguistic rule extraction problem can be easily modified for the 
application to function approximation. For an n-input and single-output function 
approximation problem, we use linguistic rules of the following type: 

Rule Rj'. If X] is Aji and ... and x„ is Aj„ then y is Bj , (11) 

where x = (xj,...,x„) is an input vector, Ap is an antecedent linguistic value, y is 
an output variable, and B j is a consequent linguistic value. Since “don’t care” is 
used only in the antecedent part, the total number of possible combinations of 
antecedent and consequent linguistic values is (5-1-1)” x5. In the rule selection 
scheme, some of those combinations are examined for generating candidate rules. In 
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the GBML scheme, each linguistic rule is denoted by its antecedent and consequent 
linguistic values as “ Aj^ ■ ■ ■ Aj„Bj A rule set is denoted by a concatenated string. 

Let us assume that m input-output pairs (Xp,yp), p = 1,2,..., m are given as 
numerical data where Xp ={Xpi,...,Xp„) is an input vector and is the 

corresponding output value. The performance of a rule set S can be measured by some 
cost function /] (S) such as the total squared error or the total absolute error: 



MS)= I (Ss(^p)-yp)^/2 or MS) 

p=i 



m 

S \ys('x-p)-yp 



p=i 



( 12 ) 



where y s (x p ) is the estimated output by the rule set S (see [6]). 

Linguistic rule extraction for function approximation can be formulated as the 
following three-objective combinatorial optimization problem: 

Minimize MS) , fiiS) , and MS) , (13) 

where f 2 (S) and f^iS) are the number of linguistic rules in S and the total rule 
length in S, respectively, as in the case of pattern classification. Our task is to find 
non-dominated rule sets of the three-objective linguistic rule selection problem. 

For this task, we can use the rule selection scheme and the GBML scheme 
described for pattern classification. The following points should be considered when 
we apply these schemes to function approximation: 

(a) The first objective is an error measure such as (12). The magnitude of output 
values should be taken into account when we use the weighted averaging 
technique for combining the three objectives into a scalar fitness function. 

(b) We should use a fuzzy reasoning method that can handle a rule set of linguistic 
rules with different specificity levels because our linguistic rules involves an 
arbitrary number of “don’t care” conditions. Inconsistency among linguistic rules 
may be resolved by such a fuzzy reasoning method (see [6]). 

(c) Since the consequent part of each linguistic rule is not uniquely specified, its 
consequent linguistic value is coded together with its antecedent linguistic values 
in the GBML scheme. This modification increases the search space from 6” to 
5x6” . Usually an appropriate consequent value for each linguistic rule can be 
limited to a few alternatives. This can be used for decreasing the search space. 



7 Conclusion 

We first formulated linguistic rule extraction from numerical data for pattern 
classification as a three-objective combinatorial optimization problem. Next we 
explained how non-dominated rule sets can be obtained from candidate rules by 
genetic algorithms. In the rule selection method, a prescreening procedure of 
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candidate rules is necessary when it is applied to high-dimensional pattern 
classification problems. Then we explained how non-dominated rule sets can be 
found by a genetics-based machine learning algorithm in a huge search space with all 
possible linguistic rules. These two schemes were applied to wine data with 13 
attributes. Simulation results showed that compact rule sets with high classification 
performance were found. Finally we modified our linguistic rule extraction problem 
for applying it to function approximation. 
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Appendix: Rule Generation and Pattern Classification 

The consequent class Cj and the certainty grade CF j of our linguistic rule in (1) can 
be determined by the following heuristic procedure (for example, see [15]): 

Step 1) Calculate the compatibility grade of each training pattern = 

{Xpi, ..., Xp„) with the linguistic rule Rj by the product operation as 

RRji'>^p) = ... x/Uj„{Xp„), p = \,2,-,m, (Al) 

where jUji ( • ) is the membership function of the antecedent linguistic value 
Aji ■ 

Step 2) For each class, calculate the total compatibility grade of the training patterns 
with the linguistic rule Rj : 

Pciassh(Rj)= I /'R.(Xp), h = l,2,-,c. (A2) 

Xp6 Class /j 

Step 3) Find the consequent class Cj that has the maximum value of Pciassh(^j)- 

/^ClassC^ (-^7) = Max{/?Qassi(/?y),...,/Jciass A^j)) ■ (A3) 

If the consequent class can not be uniquely determined, we do not extract the 
linguistic rule Rj . For example, if Pciasshi^ j) — ^ for classes, we do 
not generate Rj . 

Step 4) Specify the certainty grade CFj as follows: 

CFj = (y^ciass C: (Rj)~ t/?Class , (A4) 

h=' 

where 

i /Jciassft(^;)/(C-1). 
h=\ 
h*Cj 



(A5) 
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Let 5 be a set of generated linguistic rules. A new pattern = {x p\,...,x p^) is 
classified by S using a fuzzy reasoning method based on a single winner rule [15]. 
The winner rule Ry* for x^ = {Xp \, ..., Xp„) is determined as follows: 

RRj* (Xp ) • CFj^. = max{/tp^. (x^ ) • CF j \ Rje S]. (A6) 

The new pattern x^ is classified by the winner rule Rj*. That is, x^ is assigned 
to the consequent class of Rj* . If multiple linguistic rules with different consequent 
classes have the same maximum value in (A6), the classification of Xp is rejected. 
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Abstract. While there are many factors that determine the commercial potential 
of an electric light source, color and efficiency are arguably the most important. 
Tradeoffs between color and efficiency are frequently made in lighting 
applications, typically by moving between different light source technologies. 
However, the potential exists to change position in color-efficiency space by 
filtering a light source. Because color is specified in two dimensions, and 
efficiency in one, the Pareto-optimal color and efficiency front defines a 
surface. This paper presents a method for determining color-efficiency Pareto 
optimal surface for a filtered light source. 



1 Introduction 

There are many factors that determine the commercial potential of an electric light 
source, including color, efficiency, color rendering index, color temperature, and the 
reliability and lifetime of the source. While the success or failure of novel light source 
technologies in the marketplace is dependent on all of these factors, color and 
efficiency are arguably the most important factors. 

Tradeoffs between color and efficiency are made in different applications. For 
example, low pressure sodium lamps, which have "bad" color but high efficiency are 
used in many outdoor applications; meanwhile, incandescent lamps, which have 
"good" color but low efficiency are used in many indoor applications. These examples 
switch between light source technology to move around the color-efficiency Pareto 
optimal surface; however, by filtering a light source, it is also possible to move about 
on the Pareto optimal surface. In cases where the efficiency of a light source is high, 
but the color is undesirable, it might be possible to filter the light such that the 
efficiency is reduced by a small amount, but the color becomes acceptable. The goal 
of this research is to develop a technique for determining the color-efficiency Pareto 
optimal surface for filtered light sources. 
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2 Characterization of Light Sources 

Various radiometric, photometric, and colorimetric properties of a light source can be 
determined from the source’s spectral power distribution (SPD) [1], the radiant power 
per unit wavelength as a function of wavelength. The SPDs of the three light sources 
used in this paper are plotted in Figure 1. Note that SPDs can be smooth and 
continuous (e.g., incandescent lamps, sulfur lamps), or more spiky, with power either 
spread throughout the visible spectrum (e.g., metal halide lamps, fluorescent lamps), 
or concentrated principally in one portion of the visible spectrum (e.g., high pressure 
sodium lamps, low pressure sodium lamps). 




wavelength (nm) 



Fig. 1. SPD for metal halide (MH), high pressure sodium (HPS) and incandescent (Inc) lamps. 



2. 1 The CIE System g 

The International Commission on Illumination (CIE') is the ISO recognized body for 
all matters regarding the science and art of lighting. The CIE has developed [2] a 
method to represent the color and brightness of a light source from its SPD. The SPD 
is weighted by the color matching functions and integrated over the visible spectrum 
to provide the tristimulus values, X, Y and Z: 



’ After the Trench, ’Commission Internationale de lEclairage’. 
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x = k\P;^x{X)(a 

/I 

Y = k\PJ{X)dX 

/i 

Z = k\P;^z{X)dX 

A 

where X (A) , y (A) , and Z (/^) are color matching functions (weighting factors, 

Fig. 2), P(Z) is the power at wavelength/^, and k is an application specific 

constant (set to 1 for the work presented here). The Y tristimulus value is 
proportionate to the brightness of the light source. Efficiency is defined as the ratio of 
the Y tristimulus value of a filtered light source to the Y tristimulus value of the same 
unfiltered light source (i.e., "how bright is the filtered light source compared to the 
original light source"). Note that because this is a measure of relative efficiency 
(rather than absolute efficiency), efficiency as defined here should not be directly 
compared between lamps. 




Fig. 2. CIE color matching functions. 
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The tristimulus values can be normalized: 

X (4) 

X = 

x + r + z 

Y (5) 

J = 

X+Y + Z 

to yield the chromaticity coordinates, x and y, which describe the color of the light 
source. The universe of possible chromaticity coordinates is referred to here as ’color 
space’. 

While a SPD has a unique set of chromaticity coordinates, a certain set of 
chromaticity coordinates can be produced by an infinite number of SPDs. Moreover, a 
light source might be filtered any number of ways to reach a point in color space - 
what will vary (among other properties, such as color rendering) is the brightness 
parameter of the light source, Y, and, therefore, the relative efficiency of the filtered 
light source. For any point in color space achievable by a light source, there is a way 
to filter the light source such that the efficiency at that color is maximized. 

While only two factors, color and efficiency, are considered in this paper, because 
color is inherently two dimensional, the universe of non-dominated solutions results 
in a surface rather than a line. The surface is defined by the chromaticity coordinates 
and the maximum achievable efficiency at that point. 



3 Optimization Approach 

Genetic algorithms (GAs) were used to solve this multiobjective optimization 
problem. A sampling technique was used to fully explore color space, which ensures 
that areas near the spectrum locus are properly evaluated. However, this does induce a 
cost in computation time over other methods, which can be partially mitigated by 
sampling more frequently in areas of rapid change in efficiency and less frequently in 
areas of slower change. A grid of points in color space was cycled through, with each 
point acting as the target color of for a full run of a GA. The GA Optimization 
Toolbox [3] was used to implement the GAs for this research. 



3.1 GA Implementation 

The function: 

fitness = efficiency - 2.5*(distance/0.08)^ (6) 

was used to evaluate the fitness of chromosomes, where ’distance’ is Euclidean 
distance in color space from the target color. This fitness metric incorporates both 
objectives (efficiently and color); the color portion of the fitness function is low in a 
small region near the objective (to give the GA room to search the efficiency 
dimension) and quite high away from the objective (encouraging the GA to stay away 
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from unfruitful areas in the color space). This has been shown [4] to be a good fitness 
function for finding an efficient filter near a given color. 

A floating-point representation was used for this problem. The visible spectrum 
(400 to 700 nm) was partitioned in 151 bins, each 2 nm wide. Each chromosome 
consisted of 151 genes, where each gene represents the transmittance of the filter over 
a 2 nm band of the visible spectrum. The 2 nm bin width was chosen as a compromise 
between smoothness and computational tractability. Valid allele values for each gene 
could range from zero to one (i.e., 0% to 100% transmission at that wavelength). A 
population size of 50 chromosomes was used, which was initialized with uniform 
random gene values. A run was terminated after 500 generations. 

Three mechanisms of crossover were applied simultaneously: (i) single point 
crossover [3]; (ii) arithmetic cross-over (produces two complimentary linear 
combinations of the parents) [3]; and (iii) heuristic cross-over (cross-over based on 
interpolation, moving in the direction of the fitter chromosome) [3]. Three problem 
specific mutation methods were applied: (i) boundary chunk mutation (BCM); (ii) 
push mutation (PM); and (iii) smooth mutation (SM). 

BCM, PM, and SM are mutation methods that were developed specifically for the 
spectrum optimization problem, based on two properties observed in optimal 
solutions. First, many gene values are exactly at the limits of the allele (i.e., 100% 
transmission or 0% transmission) in relatively fit chromosomes. Second, adjacent 
wavelength-bins have nearly the same value (smoothness). These mutation methods 
were bases on the "chromosome smoothing" technique [4], which has been shown to 
increase the convergence rate dramatically for this type of problem. However, a 
limitation of this method is that it tends to drive the solutions toward a particular 
result - more or less "single smooth notch" solutions. While this is suitable for many 
applications, it is not an ideal result. Because the problem- specific mutations are 
implemented probabilistically (rather than to every chromosome), it was expected that 
this unwanted side effect would be mitigated. 

The boundary chunk mutation selects a random contiguous portion of the 
chromosome (up to 10% of the total length) and set it to the one of the limits (either 1 
or 0). This mutation was expected to be effective because many of the genes of fit 
solutions were at the maximum or minimum value (i.e., 100% transmission, or 0%). 
Moreover, because most of the genes for any color could be expected to be at 100% 
transmission, the selection of which boundary to mutate to was biased slightly: 65% 
of the time it went to 1, 35% of the time it went to 0. 

The push mutation selects a random contiguous portion of the chromosome (up to 
20% of the total length) and scales the genes from their current value towards either 1 
or 0 by a randomly chosen fixed amount (up to 20%). 

The smooth mutation selects a random contiguous portion of the chromosome (up 
to 20% of the total length) and smoothes it - the value of each gene in the mutated 
portion is weighted by the value of its neighboring genes: 

gene[i]=0.2*gene[i-l]H-0.6*gene[i]-t0.2*gene[iH-l] 



(7) 
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4 Results 

Figures 3-5 are contour plots of the color-efficiency Pareto-optimal surface for the 
three different sources. The heavy, sail shaped line is the spectrum locus - the set of 
chromaticity coordinates for monochromatic lights (and the straight line at the bottom 
of the "sail", connecting two end points of the spectrum locus, known as the purple 
line [1]). The spectrum locus defines color space - no light source can ever plot 
outside the spectrum locus. The heavy dashed line is the hlackbody locus - the set of 
chromaticity coordinates for a hlackbody radiator at various temperatures. Light 
sources near the hlackbody locus in color space are frequently considered more 
desirable in the lighting industry, as they tend to appear more "natural". The heavy dot 
denotes the chromaticity coordinates of the unfiltered light source (and the only point 
where efficiency == 1.0). 




Fig. 3. Efficiency contours for the high pressure sodium source. 
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CIE X chromaticity coordinate 



Fig. 4. Efficiency contours for the metal halide source. 

The shape of the color-efficiency surface is consistent with what one might expect 
from the SPDs of the three sources. The amount of color space achievable at any level 
of efficiency by the HPS source is much smaller than the area achievable by the other 
two sources, due to the HPS lamp having most of its power in a relatively small 
region of the visible spectrum. Although both the HPS and incandescent sources have 
energy spread throughout the spectrum, the HPS source has a relatively narrow 
(compared to the incandescent) band of high efficiency, related to the relative spiky 
spectrum of the HPS lamp. 



5 Discussion 

This paper presents a method for determining the color-efficiency Pareto optimal 
surface for filtered light sources, which has two major applications. First, as novel 
light sources are developed which are extremely efficient, but have a color that is 
unacceptable for some applications, filtering may adjust the color to an acceptable 
region while maintaining a relatively high level of efficiency, permitting a wider 
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range of application for the lamp technology. For example the sulfur lamp [5] has a 
very high luminous efficacy, but its chromaticity coordinates (x=0.33, y=0.41) are 
relatively distant from the blackbody locus, which makes it unsuitable in appearance 
for many indoor applications. This lamp could be filtered to bring it much closer to 
the blackbody locus while still maintaining high luminous efficacy, which would 
make it much more desirable for indoor applications (and, consequently, more 
marketable). 




CIE X chromaticity coordinate 



Fig. 5. Efficiency contours for the incandescent source. 

A second major application is filtering existing light source technologies for use in 
new applications. For example, metal halide lamps are used extensively in theatre 
lighting where colored lighting is desired. Currently, the shift in color is achieved by 
modifying the composition of metal salts in the lamp, which tends to reduce the lamp 
life. A filtered standard (i.e., longer life) metal halide lamp might be an attractive 
alternative. The technique described here gives manufactures the ability to determine 
where in color-efficiency space filtered lamps might be competitive. 
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Abstract. This paper examines the road train concept as a new alternative in 
long-distance freight traffic. The design of such a system is a difficult task since 
many different and conflicting criteria arise depending on the application spec- 
trum, the legal conditions and the preferences of the carrier. Furthermore the 
evaluation of each decision alternative relies on a time consuming and sophisti- 
cated simulation. Evolutionary algorithms (EAs) have shown to be a useful tool 
for multi-objective optimization in engineering design. Based on a unified 
model, we develop a problem- specific evolutionary algorithm which features 
strong elitism, an unlimited archive of non-dominated solutions and density de- 
pendent selection. This EA is able to create alternatives which dominate previ- 
ous manually engineered solutions as well as those derived from exhaustive 
search. 



1 Introduction 

In freight traffic the importance of trucks grows constantly. Based on a forecast by the 
Prognos-Institute experts expect an increase of transportation performance on German 
roads by 55 % until the year 2015 [7]. This would cause a proportional rise of the 
mileage with a higher traffic load unless transportation regulations are changed. One 
possibility to avoid this effect is to extend the vehicle load in freight transportation. 

In Germany the vehicle load is limited to a maximum weight of 40 t with a maxi- 
mum vehicle length of 18.75 m. This makes it possible to carry for example two C 782 
containers with a load capacity of max. 25 t. In order to avoid road damages, the 
maximum load per axle of today has to be kept constant. 

Based on concepts currently used in other countries, the ika (Institut fiir Kraftfahr- 
wesen, RWTH Aachen) developed a road train concept, consisting of a semi-trailer 
truck and two semi-trailers, connected by a dolly (see Figure 1). 



E. Zitzler et al. (Eds.): EMO 2001, LNCS 1993, pp. 612-623, 2001. 
© Springer-Verlag Berlin Heidelberg 2001 
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Fig. 1. Example of a road train with two semi-trailers. 



First simulations of the longitudinal dynamics of those road trains showed a consid- 
erable decrease of fuel consumption in connection with an increase of traffic flow on 
German highways, although their power trains were not adapted to the new demands. 
Furthermore a rise in maximum speed by 25% from 80 km/h to 100 km/h resulted in a 
significantly lower increase of the fuel consumption of 5%. Therefore two versions of 
the road train are to be developed [6]. Maneauverability and transversal dynamical 
behaviour of road trains were investigated in [9] . 

This paper now focuses on the design space exploration of road trains concerning 
many different optimization criteria. The next section discusses the issues raised in 
road train design, states the resulting optimization problem in terms of decision vari- 
ables and objectives and explains how each decision alternative is evaluated by exten- 
sive numerical simulation of different driving conditions. In section 3 we motivate the 
use of evolutionary algorithms for this design space exploration task and describe 
which algorithmic modifications were necessary to deal with the large number of 
objectives. The results are presented in section 4 and compared to previously known 
solutions. In section 5 we conclude with the implications of our results from the engi- 
neering as well as from the algorithmic point of view. 

2 Optimization of a Road Train 

The optimization of a whole new vehicle concept with respect to fuel consumption and 
driving dynamics is a very complex subject, because the lack of existing data and 
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knowledge leaves a wide open space for experiments concerning the power train and 
the overall weight of the road train. 




Fig. 2. Engine characteristic graph. 

An increase of weight leads to an increase of road and climbing resistance. That 
changes the engine operating point and therefore its efficiency and fuel consumption. 

Every single driving condition defines a point in the engine characteristic graph. 
The number of revolutions is determined by the velocity of the vehicle and the total 
gear ratio, consisting of rear-axle ratio and transmission ratio. The necessary torque is 
a result of necessary power output (influenced by velocity, efficiency of the gear box, 
acceleration, and road gradient) and revolutions. A lower total gear ratio reduces the 
engine speed. Under the presumption of constant running resistance due to constant 
velocity and road gradient the required power remains unchanged. The line of constant 
power indicates this relationship in Figure 2. 

Long-distance transport vehicles usually drive rather statically, operating at maxi- 
mum authorized speed. This leads to the assumption that the gear ratio should be low 
enough to cause an engine operating point in the area of lowest specific fuel consump- 
tion. However, this area is close to the line of maximum torque. Small increases of the 
running resistance, resulting from headwind or road gradient, cannot be compensated 
by requesting more torque from the engine, but force the driver to shift gears or to go 
at a lower speed. 

Since the drivability of the vehicle requires a big distance between the most fre- 
quent engine operating point and the line of maximum torque, resulting in powerful 
engines and high gear ratios, it opposes the attempt to reduce the fuel consumption. 
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The goal of the optimization is to find a combination of overall weight, gear box, 
engine and driving strategy minimizing fuel consumption, optimizing the driving per- 
formance and increasing driving convenience. 

Another difficulty in the design process of a long-distance freight vehicle is the 
large application spectrum. Some carriers operate only in a rather even area, like the 
Netherlands for example. It is obvious that they would prefer a road train version 
different from one a carrier would choose whose standard route crosses the Alps. The 
latter puts much more emphasis on the climbing capacity than the other. 



2.1 Overall Weight 

Based on current semi-trailer trucks the overall weight of an average prime mover 
with a nominal power of more than 310 kW can be set to about 8 tons. An ordinary 
semi-trailer weighs about 7 tons, leaving a load capacity of 25 tons. Therefore an 
upper limit for the overall weight of the road train can be set at 72 tons, representing a 
prime mover with two full-size semi-trailers. 

Assuming a constant ratio of load capacity and overall trailer weight of 0.78125, 
the load capacity can be varied from 25 tons to 50 tons. 



2.2 Power Train 

The simulated road train has a power train consisting mainly of a combustion engine 
(diesel), a clutch, a manually shifted transmission and a rear-axle differential. The 
engine used as basis for the simulation is the Mercedes-Benz OM 442 LA. It was 
slightly modified to represent an average modern truck engine. It has a nominal power 
of 314 kW at 2000 rpm and an optimal specific fuel consumption of 193 g/kWh. 

Two standard gear-boxes were chosen for the simulation. Both of them provide 16 
gears, with the direct gear being 15*'’ and 16“’ respectively. The efficiency of the gear 
box in the direct gear is assumed to be 2% higher than in the other gears, because the 
flow of power avoids the toothed wheels, which cause the loss of efficiency. 



2.3 Resulting Design Variables 

One part of the optimization process is the choice of a suitable engine. The engine 
characteristic graph of the modified OM 442 LA mentioned above is the basis for the 
engines used in the simulations. The creation of more powerful engines is achieved by 
multiplying the engine torque with 1h-x„, x„£[0,1], in every point of the engine char- 
acteristic graph. The relating efficiencies remain unchanged. 

The second design variable x,, x,e [0,1], defines the overall weight and therefore the 
load capacity as well. 

The ratios of the manual gear-box remain unchanged, the total gear ratio is varied 
through Xj, XjE [0,1], responsible for the rear-axle ratio. 
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Stretch-Module 



Decision-Module Handling-Module 




Fig. 3. Elements of PELOPS. 



The driver can influence the vehicle performance by choosing the engine speed to 
shift gears, x^, X 3 e[ 0 ,l], defines certain engine speeds and accelerations for initiating 
gear shifts. 

Furthermore two manual gear boxes can be chosen, x^e {0,1 } distinguishes between 
a gear box with direct 15“' gear and one with direct 16“' gear. 



2.4 Simulation 

Each setting of the design variables represents a decision alternative, and the resulting 
vehicle performance is determined through simulation. For this we apply the simula- 
tion tool PELOPS [5], which has been developed by the ika in cooperation with 
BMW. It analyses interchanges between vehicle, driver and environment. Its three 
main elements (see Eigure 3) are the stretch-module, the decision-module and the 
handling-module. The cause-effect-principal is used to realize the relation between 
vehicle performance, environmental impacts and the driver’s decisions. 

Ten objective functions are defined to give a complete characterization of the vehi- 
cle performance considering fuel consumption and driveability. All these objective 
values are computed within the simulation. Thus, the resulting multi-objective optimi- 
zation problem can be stated as follows (where the function/is implicitly given by the 
simulator): 
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min = time for acceleration 0-40 km/h 
min y, = time for acceleration 40-90 km/h 
max y^ = maximal velocity 

max y^ = maximal velocity, 1,5% gradient, 14“' gear 

max y^ = maximal velocity, 1,0% gradient, 16'“ gear 

min y^ = average fuel consumption per ton load and 100 km, 100 km/h 

min y^, = average fuel consumption per ton load and 100 km, 80 km/h 

max y, = average speed on a highway (including road gradient) 

min y^ = average fuel consumption per ton load and 100 km on a highway 

min y, = number of gear shifts on a highway 

where y = (y„, . . . , y,) = f (x„, . . . , xj, 

x„, Xj, X„X 3 £ [0,1], 

X, E {0,1}. 

Six simulation scenarios are used to calculate the objective function values. A full- 
load acceleration, two constant-velocity scenarios (80 km/h and 100 km/h), two sce- 
narios with constant gradient and the engine operating at full load and a highway sce- 
nario. The highway scenario consists of an 18 km drive over an empty highway, with 
road gradient varying from ^.5 % to -t-3.9%. 



3 Multi-objective Design Space Exploration 

Since the road train is a relatively new concept, only little is known about how its 
performance depends on the design variables and about the trade-offs between the 
objectives. Therefore the aim is to first explore the design space by approximating the 
set of non-dominated solutions as good as possible. No prior knowledge nor any pref- 
erence information [3] is given beforehand in order not to preclude any efficient solu- 
tions from this high-dimensional objective space. 



3.1 Evolutionary Algorithms for Multi-objective Optimization 

In engineering design, the design space may be very heterogenous and the objective 
function is often given by a simulator and thus not in an analytical form. Evolutionary 
algorithms [1] can deal well with these kinds of optimization problems since they do 
not pose any preconditions to the objective function or the type of decision variables. 
Their population concept makes them especially suited to multi-objective problems, 
where they can approximate a set of efficient solutions in parallel [10]. 

Evolutionary algorithms work on sets of individuals which represent different deci- 
sion alternatives. The individuals undergo a cycle of iterative variation and selection 
until some termination criterion is fulfilled. The variation operator usually consists of 
recombination (to exchange information between individuals) and mutation (to alter 
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individuals randomly). In the selection part, the better individuals (in respect to the 
objective function) are kept for the production of offspring, while the worse alterna- 
tives are deleted. 

In multi-dimensional objective spaces the partial order makes the selection decision 
more complicated. Modern multi-objective evolutionary algorithms use selection rules 
which avoid to aggregate the multiple objectives into a surrogate scalar function and 
thus retain the true multi-objective nature of the problem [2]. 




Fig. 4. Flow chart of a universal multi-objective evolutionary algorithm (UMMEA). 



3.2 Choice of the Algorithm 

Though a great variety of (multi-objective) evolutionary algorithms has emerged so 
far, many engineering design problems still require to define or to customize an appli- 
cation-specific implementation. Here, the Unified Model for Multi-objective Evolu- 
tionary Algorithms (UMMEA [4]) is used. This model allows to systematically com- 
bine the different operators that have been proposed and discussed in the literature and 
to include own problem- specific instances, a schematic view is given in Figure 4. 

Like in many other algorithms, an archive is used besides the normal population to 
store all non-dominated solutions offline during the run. In our case the archive must 
be very large because the possible size of the trade-off surface increases with the ob- 
jective space dimension. Practically we even do not have to bound its size at all be- 
cause the long duration of the simulation (about 30 seconds) already limits the total 
number of alternatives that can be generated in a reasonable amount of computing 
time. 

The parents for the next generation are selected exclusively from the archive. In or- 
der to avoid genetic drift and an oversampling of easily accessible objective space 
regions, it is necessary to employ density dependent selection: In each iteration the 
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density is estimated for every point represented by the individuals, and the individuals 
are selected with a probability reciprocal to this density. This leads to a more uniform 
distribution of alternatives in the approximated trade-off surface. 

Each individual represents a vector of design variables. The variation operator for 
this study only applies mutation, which is carried out by adding a normal distributed 
random variable 

= xl‘^ + x' , 

to each decision variable x , while a constant a=0.02 was used. Recombination 
turned out not to be of use here since the interdependence of the design variables in 
every part of the objective space seems to be very high. 



4 Results 

In order to analyze the quality of the vehicles developed by the evolutionary algorithm 
a road train version is designed in a traditional way, based on simple rules for opti- 
mizing a power train of a truck [ 8 , 1 1 ]. 

In addition, two grid searches over the whole area of possible combinations are per- 
formed, each with a total number of 2160 elements. One of them was restricted to a 
maximum authorized speed of 80 km/h, the other to 100 km/h. 

For the design space exploration with the evolutionary algorithm a hierarchical ap- 
proach was used. The first run of the evolutionary algorithm is performed to narrow 
the design variable intervals. An analysis of the trade-offs between the different ob- 
jectives leads to the conclusion that a focus on reducing the fuel consumption would 
not necessarily worsen the other objective values to an unacceptable amount. Further- 
more, this goal is the main factor for the profitability of a vehicle concept and deserves 
special attention. Therefore we chose the average fuel consumption on the highway 
(y,) as the objective value that defines a ranking of the solutions; and y^ can be used 
to represent the second main part of the driving performance, the required climbing 
ability. In this case the reduction of the maximum velocity must not exceed 5 km/h. 
The solutions that did not meet this criterion were removed from the ranking. The 
remaining individuals were ranked according to the fuel consumption on highways. 
The top solution was considered as the best version. 

According to Figures 5 and 6 the modified design variable intervals are defined as 
follows: x„£ [0.4,0,6]; Xj=l; x^e [0.3,0.4]; X 3 E [0,0.5]; x^=0 for the 100 km/h road train 
and x„£ [0. 0,0,4]; Xj=l; x^e [0.55,0.85]; X 3 E [0,0.5]; x^=0 for the 80 km/h road train. 

Fimited to those intervals a second run of the same evolutionary algorithm then ful- 
filled a more exact approximation of the Pareto set in the region of interest. Of course, 
there are other ways to cope with the large number of incomparable alternatives in the 
presence of many objectives. These typically rely on preference information, for in- 
stance aggregating (or dropping) objectives, lexicographic ordering or the transforma- 
tion of objectives into constraints. In many cases, however, it is very diffi- cult to 
derive an exact numerical representation of the preferences, even if the designer cer- 
tainly has some fuzzy preferences in mind. Moreover, since we had different decision 
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makers with different preferences in mind, the aim is first to explore the Pareto set as 
broad as possible with a minimum number of simulations before exploiting interesting 
regions through restriction of the decision variable space as described above. Finally it 
should be mentioned that even dropping highly correlated objectives does not help 
since these correlations are usually not known in advance, can differ much in different 
regions of the search space, and they do not contribute to the dimensionality of the 
Pareto set. 

Final results show a huge advantage of road trains with respect to fuel consumption. 
A decrease of 23% (80 km/h-version) respectively 26% (100 km/h-version) is 
achieved on highways in spite of the rather tough gradients. In steady-state operation 
fuel consumption advantages of up to 35% are accomplished. With acceleration being 
at a sensible level the road trains have no disadvantages in climbing ability and re- 
quired gear shifts. 
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Fig. 5. Design variables and specific fuel consumption for the 100 km/h road train. 




Fig. 6. Design variables and specific fuel consumption for the 80 km/h road train. 
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Fig. 7. Comparison between a normal truck and a road train concerning fuel consumption on a 
highway, maximum authorized speed of 100 km/h (left) and 80 km/h (right). 
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Fig. 8. Specific fuel consumption on a highway for the 100 km/h road train (left) and the 80 
km/h road train (right). 



The comparison of the different road train versions indicates that the evolutionary 
algorithm is able to generate solutions which dominate all other results. Showing the 
same climbing ability and acceleration as the traditionally developed versions and the 
ones gained by a grid-scan over the whole parameter area, the EA-solution needs 
about 1% less fuel on the highway. The 100 km/h version is even better than the best 
version found out by a grid-scan of 1000 elements distributed over the narrowed inter- 
vals. 

Fig. 9 shows the relation between the objective function y, (maximal velocity in 14“' 
gear with 1.5% road gradient) and y^ (specific fuel consumption on highway). This 
relation provides information about the trade-off between drivability and fuel econ- 
omy. 
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The creation of 1300 individuals already produces a rather large number of solu- 
tions, which have to be considered better than any solution found out without the evo- 
lutionary algorithm. This advantage in efficiency will become even more important 
when more sophisticated driving scenarios - and thus more time consuming simula- 
tions - will be used, which is subject to further research. 





Fig. 9. velocity (1,5% road gradient) and specific fuel consumption on a highway for the 100 
km/h road train (left) and the 80 km/h road train (right). 



5 Conclusion and Outlook 

The results of the design space exploration showed a number of interesting new as- 
pects concerning the optimization of the road train concept. First of all road trains 
have huge advantages compared to standard trucks concerning fuel consumption per 
load. This efficiency is certainly the most dominating aspect when developing vehicles 
for long-distance transport. Furthermore the use of rather powerful engines considera- 
bly increases the climbing ability of the road trains without worsening the fuel con- 
sumption massively. 

The approximation of the Pareto set enables engineers and carriers to choose the 
right configuration for a special application. A hand-made customization for every 
application spectrum would require a huge amount of time and work. Thus the design 
space exploration performed in this paper is a powerful tool in vehicle development. 
Other possible future fields of operation are the analysis and optimization of vehicles 
equipped with driver assistance systems or collision avoidance, which need extensive 
human-machine-interaction. 

From the algorithmic point of view the optimization problem turned out to be a 
challenging task for the evolutionary algorithm because of its high-dimensional objec- 
tive space. Since on average more than 30 % of the generated solutions were non- 
dominated, a huge archive was needed to reflect the whole range of possible efficient 
solutions. In this situation it is difficult to maintain an appropriate selection pressure 
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towards the real trade-off surface. Therefore we chose a strong elitist approach, i.e. all 
parents were drawn from the archive. This was not harmful in respect to premature 
convergence as the archive always exhibits enough diversity. On the contrary, the 
solutions have even shown to be too diverse to make recombination of distant indi- 
viduals viable. Finally, a density based selection was necessary to reach a good distri- 
bution of alternatives to all regions of the potential trade-off surface, where the parents 
were sampled with a probability reciprocal to their estimated density instead of just 
applying a rank-based selection scheme based on the densities. Future enhancements 
of the algorithm could include a scheme to combine both density and preference in- 
formation in the selection process. 
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Abstract. In this paper, a new multiobjective genetic algorithm is employed to 
support the design of a hydraulic actuation system. Eirst, the proposed method is 
tested using benchmarks problems gathered from the literature. The method per- 
forms well and it is capable of identifying multiple Pareto frontiers in multi- 
modal function spaces. Secondly, the method is applied to a mixed variable de- 
sign problem where a hydraulic actuation system is analyzed using simulation 
models. The design problem constitutes of a mixture of determining continuous 
variables as well as selecting components from catalogs. The multi-objective 
optimization results in a discrete Pareto front, which illustrate the trade-off be- 
tween system cost and system performance. 



1 Introduction 

Most engineering design problems consist of several often conflicting objectives. In 
many cases, the multiple objectives are aggregated into one single overall objective 
function. Optimization is then conducted with one optimal design as the result. The 
result is then strongly dependent on how the objectives are aggregated. To avoid this 
difficulty and in order to explore a broader set of optimal solutions the concept of 
Pareto optimality is employed. Valuable insight about the trade-off between the ob- 
jectives could be gained by investigating the set Pareto optimal solutions. Vilfredo 
Pareto defined Pareto optimality as the set where every element is a problem solution 
for which no other solutions can be better in all design attributes. A solution in a Pa- 
reto optimal set cannot be deemed superior to the others in the set without including 
preference information to rank competing attributes. 

This paper develops a Pareto optimization method for use in multiobjective, multi- 
modal design spaces. For a general design problem, the design space consists of con- 
tinuous variables as well as selection of individual components from catalogs or data- 
bases. Furthermore, numerical simulations and other CAE tools are often employed to 
evaluate design solutions; i.e. simulation is employed to transform solutions from the 
design space to the attribute space. As the attributes or objectives are calculated using 
numerical simulations, there is no simple way of obtaining derivatives of the objective 
functions. Therefore genetic algorithms are well suited for such applications — they do 



E. Zitzler et al. (Eds.): EMO 2001, LNCS 1993, pp. 624-638, 2001. 
© Springer-Verlag Berlin Heidelberg 2001 




Multiobjective Optimization of Mixed Variable Design Problems 625 



not need derivatives of the objective functions and they have been shown to be effec- 
tive in optimizing mixed variable problems l ITTI in multi-modal search spaces 

The paper first defines a general multiobjective optimization problem and reviews 
related work on multiobjective genetic algorithms. Then, a new method is proposed 
and validated using a problem gathered from the literature. Later the method is applied 
to a real design problem containing a mixture of continuous design variables and dis- 
crete selections of components from catalogs. The problem is solved by connecting the 
optimization strategy to a simulation program. 



1.1 The Multiobjective Design Problem 

A general multiobjective design problem could be expressed by equations (1) and (2). 

min F(x) = {f, (x), (x),..., /, {x)f (1) 

s.t. xe S 

x = (xi,X2,...,x„f (2) 

where /j(x), / 2 (x),...,/j,(x) are the k objectives functions, (xj,X 2 ,...,x„) are the 
n optimization parameters, and S & R" is the solution or parameter space. Obtainable 
objective vectors, {f(x)|xg 5} are denoted by T. Ye R'‘ is usually referred to as the 
attribute space. 

The Pareto set consists of solutions that are not dominated by any other solutions. 
Considering a minimization problem and two solution vectors x, yeS. x is said to 
dominate y, denoted x y , if: 

V/e {l,2,...,k}: /,.(x)< /,.(y) and 3je {l,2,...,k}: fj{x)<fj{y) (3) 

The space in formed by the objective vectors of Pareto optimal solutions is 
known as the Pareto optimal front. 



1.2 Multiobjective Genetic Algorithms 

Genetic algorithms are modeled after mechanisms of natural selection. Each optimi- 
zation parameter (xj is encoded by a gene using an appropriate representation, such as 
a real number or a string of bits. The corresponding genes for all parameters x,,..x„ 
form a chromosome capable of describing an individual design solution. A set of 
chromosomes representing several individual design solutions comprise a population 
where the most fit are selected to reproduce. Mating is performed using crossover to 
combine genes from different parents to produce children. The children are inserted 
into the population and the procedure starts over again, thus creating an artificial Dar- 
winian environment. For a general introduction to genetic algorithms, see l|^. 
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Additionally, there are many different types of multiobjective genetic algorithms. 
Literature surveys and comparative studies on multiobjective genetic algorithms could 
be found in for example p^and fT^ . 

Most multiobjective genetic algorithms use either the selection mechanism or some 
sort of Pareto based ranking to produce non-dominated solutions. In the proposed 
method, the ranking scheme presented by Fonseca and Fleming is employed. 

In the multiobjective GA (MOGA) each individual is ranked according to their 
degree of dominance. The more population members that dominate an individual, the 
higher ranking the individual is given. Here an individual’s ranking equals the number 
of individuals that it is dominated by plus one, see Figure 1 . Individuals on the current 
Pareto front will have a rank of 1, as they are non-dominated. The rankings are then 
scaled to score individuals in the population. In MOGA both sharing and mating re- 
strictions are employed in order to maintain population diversity. Fonseca and Fleming 
also introduce preference information and goal levels to reduce the Pareto set to those 
that simultaneously meet certain attribute values. 




Fig. 1. Population ranking according to Fonseca and Fleming. 

Although there is a substantial body of research on multiobjective genetic algo- 
rithms, there are still important issues that current methods address with only partial 
success. The methods typically require extensive genetic algorithm parameter tuning 
on a problem-by-problem basis in order for the algorithm to perform well. However, 
in a real-world problem there is little knowledge about the shape of the attribute space, 
which makes it difficult to assess problem specific parameters. Additionally, existing 
methods do not handle consistently the location of multiple Pareto frontiers in multi- 
modal problem spaces. The method presented in this paper is capable of identifying 
multiple frontiers without any problem specific parameter tuning. 



2 The Proposed Method 

The multiobjective struggle genetic algorithm (MOSGA) ^ combines the struggle 
crowding genetic algorithm JT] with Pareto-based ranking as devised in |F| . 
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In the struggle algorithm, a variation of restricted tournament selection ||8j, two 
parents are chosen randomly from the population, and crossover/mutation are per- 
formed to create a child. The child then has to compete with the most similar individ- 
ual in the entire population, and replaces it if the child has a better fitness. This re- 
placement strategy counteracts genetic drift that can spoil population diversity. The 
struggle genetic algorithm has been demonstrated to perform well in multi-modal 
function landscapes where it successfully identifies and maintains multiple peaks. 

There is no single objective function to determine the fitness of the different indi- 
viduals in a Pareto optimization. Therefore, the ranking scheme presented by Fonseca 
and Fleming is employed, and the “degree of dominance” is used to rank the popula- 
tion. Each individual is given a rank based on the number of individuals in the popula- 
tion that are preferred to it, i.e. for each individual the algorithm loops through the 
population counting the number of preferred individuals. "Preferred to" could be im- 
plemented in a strict Pareto optimal sense or extended to include goal levels on the 
objectives in order to limit the frontier. 

The principle of the MOSGA algorithm is outlined below. 

Step 1: Initialize the population. 

Step 2: Select parents randomly from the population. 

Step 3: Perform crossover and mutation to create a child. 

Step 4: Calculate the rank of the child, and a new ranking of the population that con- 
siders the presence of the child. 

Step 5: Find the most similar individual, and replace it with the new child if the 
child’s ranking is better. 

Step 6: Update the ranking of the population if the child has been inserted. 

Step 7; Perform steps 2-6 until the mating pool is filled. 

Step 8: If the stop criterion is not met go to step 2 and start a new generation. 

The similarity between of two individuals is measured using a distance function. 
The method has been tested with distance functions based upon the Euclidean distance 
in both the attribute as well as the parameter space. A mixed distance function com- 
bining both the attribute and the parameter distance has been evaluated as well. 



2.1 Genome Representation 

The genome encodes design variables in a form suitable for the GA to operate upon. 
Design variables may be values of parameters (real or integer) or represent individual 
components selected from catalogs or databases. Thus, the genome is a hybrid list of 
real numbers (for continuous parameters), integers and references to catalog selec- 
tions, see Figure 2. 

A catalog could be either a straight list of elements, or the elements could be ar- 
ranged in a hierarchy. Each element of a catalog represents an individual component. 
The characteristics of catalogs would be discussed further on and exemplified by the 
design example. 
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Fig. 2. Example of the genome encoding. The first two elements represent real variables and the 
last two elements catalog selections. 



2.2 Similarity Measures 

Speciating GAs require a measure of likeness between individuals, a so called simi- 
larity measure. Here the similarity measure is based on a distance function calculating 
the distance between two genomes. The similarity could be based on the distance in 
either the attribute space (between the objectives), the phenotype space (between the 
design parameters) or the genotype space (in the genome encoding). As direct encod- 
ing is used (not a conversion to a string of bits), a phenotype and a genotype distance 
function would yield the same result. It is shown that the choice between an attribute 
based and a parameter based distance function might have a great influence on the 
outcome of the optimization. 



Attribute Based Distance Function 

One way of comparing two individual designs is to calculate their distance in attribute 
space. As we want the population to spread evenly on the Pareto front (in attribute 
space) it seems to be a good idea to use an attribute based distance measure. The dis- 
tance between two solutions (genomes) in attribute space is calculated using the nor- 
malized Euclidean distance (4). 



Distance(fl,^) 




( 



fia - fib 

f- - f- 

J I max J i E 




( 4 ) 



Where and/^ are the objective values for the i:th objective for a and b respec- 
tively./^^ and/^;^ is the maximum and the minimum of the i:th objective in the cur- 
rent population, and k is the number of objectives. Thus, the distance function will 
vary between 0, indicating that the individuals are identical, and 1 for the very ex- 
tremes. 



Phenotype Based Distance Function 

Another way of calculating the distance between solutions is to use the distance in 
parameter (phenotype) space. As the genome is a hybrid mixture of real numbers and 
catalog selections, we have to define different distance functions to work on different 
type of elements. The methods described here build on the framework presented by 
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Senin et al. [HP- In order to obtain the similarity between two individuals the distance 
between each search variable is calculated. The overall similarity is then obtained by 
summing up the distances for each search variable. 



Real Number Distance 

A natural distance measure between two real numbers is the normalized Euclidean 
distance, see equation (5). 






Distance(a, Zr) = , 



a-b 






(5) 



. max distance 

V / 

Where a and b are the values for the two real numbers and max distance is the 
maximum possible distance between the two values (i.e. the search boundaries). 



Catalog Selection Distance 

Distance between two catalog selections could be measured through relative position 
in a catalog or a catalog hierarchy. The relative position is only meaningful if the 
catalog is order, see Figure 3. 
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Unordered catalog of hydraulic cylinders 
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Fig. 3. Examples of ordered and unordered catalogs. 



The dimensionless distance between two elements within the same catalog is ex- 
pressed hy equation (6) and exemplified in Figure 4. 

Disla„ce(a.i) = i^:4fAL£2£(F "*> 

max distance 



Distance(a,h)=3/5 
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Fig. 4. Distance evaluation for two elements of an ordered catalog. 

For catalog hierarchies equation (6) has to be generalized. For elements belonging 
to the same suh-catalog, the distance is evaluated using the relative position within that 
suh-catalog. Otherwise, the maximum length of the path connecting the different sub- 
catalog is used. This implies that for two given sub-catalogs an element in one catalog 
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is equally distant from every element in the other catalog. The length of the path is 
calculated as the maximal distance within the smallest common hierarchy. In both 
cases, the distance is normalized hy dividing with the maximum distance (i.e. the 
catalog size). 




Fig. 5. Exemplification of distances between different catalog elements in a hierarchical cata- 
log. 



Overall Distance 

So far, distance measures for individual design variables have been developed. An 
overall distance measure for comparing two genomes is obtained by aggregating the 
distances for the individual design variables, see equation (7). 

T>,- f r u\ ^Distance(DVi) (7) 

Distance(fl,o) = 2^ 

i=i n 

Where a and b are the two designs being compared, and n is the number of design 
variables (DV) encoded by the genome. Thus, the phenotype distance between two 
individual designs is calculated by summing up the individual distances for each ele- 
ment of the genome. 

Mixed Distance Function 

As we will show later, different distance functions have different properties. By com- 
bining an attribute based distance function with a parameter based, the strengths from 
both methods could be gained. As each distance functions is normalized the mixed 
distance function is simply calculated according to equation (8). 

. , AttributeDistance(a, b) -f PhenotypeDistance(a, b) ( 8 ) 

Mixdistancefa, b) = 



2 
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2.3 Genetic Operators 



As the genome is a hybrid mix of continuous variables and catalog selections, we 
define different operators to work on different type of elements. Uniform crossover is 
used, which implies that each element of the fathers genome is crossed with the corre- 
sponding element from the mothers genome. 

For real numbers BLX crossover is used, see exemplification in Figure 6. 




Fig. 6. The outcome of a BLX crossover between two real numbers a and b is randomly se- 
lected from an interval of width 2d centered on the average M. 



For catalog selections, an analog crossover scheme is employed as illustrated in 
Figure 7. 




o 








o 


o 



Crossover(a,b) = 




Crossover(c,d) 





Fig. 7. An exemplification of the catalog crossover. The outcome of a crossover of individuals 
within the same catalog (a and b) are randomly selected from the interval between them. For 
individuals from different sub-catalogs (c and d) the outcome is randomly selected within the 
smallest common hierarchy. 



2.4 Test Function 



□ 



In order to assess the performance of the algorithm a set of test problems from Deb [4] 
was explored. Deb developed a set of problems to highlight difficulties that multi- 
objective genetic algorithms may encounter. For visualization reasons, the focus is on 
two-dimensional problems defined generally by equations (9) and (10). 

/i(xi,X2)=Xi (9) 

/2(xi,xJ = ^^^, ^(x2)>0, Xj>0 

Xj 



( 10 ) 
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If the function g is multi-modal, the corresponding multiobjective problem will 
have global and local Pareto-optimal frontiers. A multi-modal g function is defined in 
equation (11). 



g(xj = 2-expi- 



- 0 . 2 ^' 



0.004 



0.8exp<- 



rjcj-o.hVl 

. 0.4 J| 



( 11 ) 



Figure 8(a) shows the g function for 0 < < 1 with the global optima located at 

x^=0.2 and a local optima at x=0.6. Figure 8(b) shows a plot of/, and/ in the attribute 
space with the global and local Pareto optimal solutions. 10,000 randomly chosen 
solutions are generated and plotted in Figure 8(b) to illustrate that the problem is bi- 
ased — the solution density is higher towards the local Pareto-optimal front. 





(a) (b) 

Fig. 8. Figure (a) shows the multi-modal function g(x ^ ), where the global optima is situated at 
x,=0.2 and the local optima at x^=0.6. For the multiobjective problem, a/-/ plot for 10000 
random solutions is shown in (b). Notice the low solution density at the global Pareto optimal 
front. 

The optimization was conducted with a population size of 60 individuals and ran 
for 200 generations. The variables are real encoded, and BLX crossover is employed 
to produce offspring. Deb reported that the NSGA was trapped in the local Pareto 
front in 59 out of 100 runs. 

The original MOSGA algorithm used an attribute based distance function resulting 
in the algorithm converging to the local Pareto frontier in only 7% of 100 optimiza- 
tions. The algorithm found the preferred global Pareto optimal front in 86% of the 
optimizations, as shown in figure 9 (a) and (b). In 7% of the optimizations, it con- 
verged to both frontiers. Thus, the MOSGA seems more robust in locating the global 
Pareto optimal frontier. 

However, one whishes that the algorithm should be capable of identifying both 
frontiers in every optimization run. By changing to a parameter based distance func- 
tion this can be achieved. However, the parameter based distance function was slower 
and less exact in its convergence to the frontier. 

In the MOSGA, the new child has to compete with the individual most similar to it- 
self. When the comparison is done in parameter space, a portion of the population will 
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find and maintain local optima, where solutions close in the parameter space are all 
dominated. When using an attribute based distance function, solutions at local optima 
might have to compete with solutions at the global optima, as they might be close in 
attribute space. Therefore, local optima would not be maintained. 



By combining equally weighted attribute-based and a parameter-based distance 
functions to form a mixed distance measure, the advantages of fast convergence and 
the ability of finding multiple frontiers were realized. Figure 9 shows how the algo- 
rithm spreads the population evenly on both frontiers when using the mixed distance 
function. To summarize, the attribute distance function performs well on problems 
with orj^Par|^ frontier. For problems with multiple frontiers, a mixed distance func- 
tion is preferred. A more detailed discussion about the properties of the algorithm is 
given in [ 1 and 2] . 
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Fig. 9. Optimization results using different distance functions. In (a) and (b) an attribute based 
distance function is used and the population has converged to the global Pareto front. In (c) and 
(d) the mixed distance function is used and the population converges to both the global and the 
local frontier, (a) and (c) show the result in attribute space, whereas (b) and (d) show the result 
in parameter space. 



Thus, the method is capable of reliably identifying multiple Pareto frontiers in a 
single optimization run, outperforming other techniques. Another advantage is that the 
method does not require problem specific parameter settings. The only GA parameters 
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that have to be determined are population size, number of generations and the distance 
function. The method has been successfully tested on several benchmark problems 
proposed by Deb, see [1]. 



3 Design Example 

The object of study for the design example is a hydraulic actuation system. The system 
consists of a hydraulic cylinder that is connected to a mass. The motion of the mass is 
controlled by a directional valve, which in turn is controlled by a proportional con- 
troller. The system is powered from a constant pressure hydraulic supply system. 

In order to investigate the properties of different designs, the system has been mod- 
eled in the simulation package Hopsan |^. For every new genome, the optimization 
strategy calls the simulation program to evaluate that particular design. Each compo- 
nent in the simulation model consists of a set of algebraic and differential equations 
taking aspects such as friction, leakage and non-linearities into account. A graphical 
representation of the system model is depicted in figure 10. 




Fig. 10. The simulation model of the hydraulic actuation system. The main components are; 
(from the upper left) cylinder, mass, pulse generator, p-controller, directional valve, accumula- 
tor and constant pressure pump. 

The objective of the study is to design a system that has good controllability to a 
low cost. Naturally, these two objectives are in conflict with each other. To achieve 
good controllability we can choose a fast servo valve, which is more expensive, then a 
slower proportional valve. Therefor, there is a trade-off between cost and controllabil- 
ity. The cost for a particular design is composed of the cost for the individual compo- 
nents as well as the cost induced by the energy consumption. 
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The system has been studied for a pulse in the position command. The control error 
and the energy consumption are calculated based on the simulation result. 

When designing the system cylinders and valves are selected from a catalog of ex- 
isting components. Other parameters such as the control parameter, a leakage coeffi- 
cient and the maximal flow of the supply system have to be determined as well. Thus 
the problem is multiobjective with two objectives and five optimization variables of 
which two are discrete catalog selections ant three are continuous variables. 



3.1 Component Catalogs 



For the catalog selections, catalogs of valves and cylinders have been created. For the 
directional valve, the choice is between a slow but cheap proportional valve or an 
expensive and fast servo valve. Valves from different suppliers have been arranged in 
two ordered sub-catalogs as depicted in figure 11. The same structure applies to the 
cylinders as they are divided into sub-catalogs based on their maximal pressure level. 
The pressure in the system has to be controlled so that the maximum pressure for the 
cylinder is not exceeded. A low-pressure system is cheaper but has inferior perform- 
ance compared to a high-pressure system. 



Directional valves 




Fig. 11. The catalog of directional valves is divided into proportional valves and servo valves. 
Each sub-catalog is ordered based on the valve size. For each component, a set of parameters 
describing the component is store together with information on price and weight. 



Naturally, the component catalog is connected to the simulation program. The op- 
timization strategy however needs information about the topology of the catalog in 
order for the genetic operators to work. 



3.2 Optimization Results 

The system has been optimized using a population of 40 individuals and 400 genera- 
tions. In order to limit the Pareto frontier a goal level on the control error was intro- 
duced. The goal level corresponds to the highest acceptable control error. Without 
such a goal level, the result would include very cheap designs that do not follow the 
position command at all. The introduction of goal levels therefore focuses the popula- 
tion on the most interesting parts of the Pareto frontier. 
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The result could be divided into four distinct regions depending on valve type and 



pressure level, see figure 12. 
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Fig. 12. Optimization results. In (a) the initial and final population of the optimization is shown. 
In (b) the simulated pulse response for a reasonably fast solution is depicted. Figure (c) shows 
an enlargement of the Pareto front where different regions have been identified based on valve 
and cylinder selections, as shown in (d). The graphs (c) and (d) are obtained using an attribute 
based distance function, whereas (e) and (f) are the corresponding graphs obtained using the 
mixed distance function. 
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As can be seen from figure 12 there is a trade-off between system performance 
(control error) and system cost. By accepting a higher cost, better performance could 
be achieved. The cheapest designs consist of small proportional valves and low- 
pressure cylinders. By choosing larger proportional valves and high-pressure cylin- 
ders, the performance could be increased on the expense of higher cost. If a still better 
performance is desired, a servo valve has to be chosen, which is more expensive but 
has better dynamics. 

The continuous parameters, such as the control parameter, tend to smoothen out the 
Pareto front. For a given valve and cylinder, different settings on the continuous pa- 
rameters affect the pulse response. A faster response results in a lower control but also 
a higher energy consumption and thereby higher cost. Therefore, there is a local trade- 
off between cost and performance for each catalog selection. 

4 Discussion 

In the proposed method, new solutions have to compete with the most similar individ- 
ual before they are inserted into the population. Therefore, the similarity measure has 
a great influence on the optimization result. When using the attribute based distance 
function as a similarity measure, the true Pareto optimal front is identified, as shown 
in figure 12 (c) and (d). When using the mixed distance function some dominated 
solutions survive, for example servo valves with low-pressure cylinders, see 12 (e) and 
(f). These solutions represent local optima, as they dominate the solutions that are 
close in parameter space. 

The obtained results are in accordance with the results from the mathematical test 
functions. An attribute based distance function gives fast convergence to the Pareto 
optimal front, whereas a mixed distance function is a little slower in convergence but 
is capable of finding and maintaining multiple Pareto frontiers, see figure 9. 

For an engineering problem, the optimization formulation is often a simplification 
of the real world problem, which in part requires human or inquantifiable judgment. 
When deciding upon the final design there are usually more criteria to consider then 
just the optimization objectives. Therefore, knowledge of the existents of local Pareto 
optimal solutions is very valuable. For example, aspects such as robustness, product 
portfolio, maintenance and quality might be important but hard to include in the opti- 
mization. A local Pareto optimal solution might therefore be preferred to a solution at 
the global Pareto optimal front. Hence, a method that identifies and maintains local 
Pareto optimal solutions is valuable from an engineering perspective. 



5 Conclusions 

In this paper, a new multiobjective genetic algorithm has been presented and applied 
to solve a mathematical test problem as well as a mixed variable design problem. The 
method is capable of finding and maintaining multiple Pareto optimal fronts with a 
minimum of problem specific parameter settings. For the design problem, a hydraulic 
actuation system has been studied with the help of a simulation program. The optimi- 
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zation parameters have been divided into continuous parameters and discrete catalog 
selections. For the catalog selections hierarchical catalogs of valves and cylinders have 
been created using existing components. The optimization results in a set of Pareto 
optimal designs, elucidating the trade-off between system cost and system perform- 
ance. Among the optimal solutions distinct regions representing different catalog 
choices could be distinguished 

In future work, comparisons between MOSGA and other multiobjective genetic al- 
gorithms should be performed. We will also develop methods that assess the robust- 
ness of individual solutions and the importance of different design parameters. Such 
methods would facilitate the use of multiobjective optimization in engineering design. 

Acknowledgement. The software for this work used the GAlib genetic algorithm 
package, written by Matthew Wall at the Massachusetts Institute of Technology. 
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Abstract. This paper describes an application of Adaptive Range 
Multiobjective Genetic Algorithms (ARMOGAs) to aerodynamic wing 
optimization. The objectives are to minimize transonic and supersonic drag 
coefficients, as well as the bending and twisting moments of the wings for the 
supersonic airplane. A total of 72 design variables are categorized to describe 
the wing’s planform, thickness distribution, and warp shape. ARMOGAs are an 
extension of MOGAs with the range adaptation. Four-ohjective optimization 
was successfully performed. Pareto solutions are compared with Pareto optimal 
wings obtained hy the previous three-objective optimization and a wing 
designed by National Aerospace Laboratory (NAL). 



1 Introduction 

Evolutionary Algorithms (EAs) have been widely used to solve real-world 
optimization problems in the various fields with the aid of the rapid progress of the 
computers and the algorithms themselves. To treat real-world problems, a large search 
space is often needed. This can lead to slow-down of the convergence and can require 
a large number of function evaluations. Especially in aerodynamic design, which 
requires large computational time for performance evaluation, more efficient and 
effective algorithms are needed. Among several EAs coupled with CED 
(Computational Fluid Dynamics) proposed to seek optimal solutions in aerodynamic 
designs (for example, see [1-3]), Adaptive Range Genetic Algorithms (ARGAs) 
proposes a unique approach [4]. 

In the real-world optimization, the multiobjective optimization is often required 
rather than the single-objective optimization since there exist tradeoffs between 
various objectives in general. EAs have many attractive advantages to solve the 
multiobjective problem. Since EAs seek optimal solutions in parallel, multiple Pareto 
solutions can be obtained simultaneously without specifying weights between 
objectives [5]. In the aerodynamic multiobjective optimization problem, efficient and 
effective algorithms are required to reduce the large computational time. In this study. 
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Adaptive Range Multiobjective Algorithms (ARMOGAs) developed from ARGAs 
for multiobjective optimization are applied to the aerodynamic multiobjective design 
optimization. 

Aerodynamic design for supersonic transport (SST) is considered in this study. A 
next-generation SST is required to improve the supersonic cruising performance and 
to prevent the sonic boom. However, there is a severe tradeoff between lowering the 
drag and hoom. Therefore, the next-generation SST may cruise at a supersonic speed 
only over the sea. This means that it is important to improve not only supersonic 
performance hut also transonic performance, and thus the multipoint aerodynamic 
optimization is needed. In addition to the reduction of both aerodynamic drags, 
structural constraints should be considered to keep the wings from having 
impractically large aspect ratios. 

Three-objective optimization for supersonic wings, which minimized transonic and 
supersonic drag coefficients and the bending moment at the wing root, were reported 
in [6-7]. In order to consider the viscous effect, a Navier-Stokes solver was used to 
evaluate the aerodynamic performances at both transonic and supersonic conditions 
[7]. Successful results were obtained by the multiobjective optimization. There were 
Pareto solutions that outperformed the NAL’s second design in all three objectives, 
and those wings were similar to the “arrow wing” planform. Although the arrow wing 
is known to be good for supersonic aerodynamics, it is known to have aeroelastic and 
control problems due to a large sweep angle. The primary concern is a pitching (twist) 
moment of the wing. The design results also showed that the second derivative of the 
wing thickness distribution was discontinuous. This lead to another concern of the 
designer for the possible boundary layer separation at the maximum thickness 
location. Therefore the minimization of the pitching moment is added as the present 
fourth objective with an improved wing thickness parameterization. 

National Aerospace Laboratory (NAL) in Japan is working on the scaled 
experimental supersonic airplane project (NEXST-I) [8]. A scaled experimental 
supersonic airplane without a propulsion system will be launched with a rocket in 
2002. The airplane will be separated from the rocket after launch and will glide down 
to sample the flight data in the supersonic region. The flight data will be compared 
with the CFD results to validate the reliability and accuracy of CFD predictions. NAL 
designed several configurations for the experimental aircraft. The present Pareto 
solutions obtained are compared with the NAL’s design. In order to verify the present 
optimization method, the present Pareto solutions are also compared with the Pareto 
solutions obtained before. 



2 Adaptive Range Multiobjective Genetic Algorithms 

To reduce the large computational burden, the reduction of the total number of 
evaluations is needed. On the other hand, a large string length is necessary for real 
parameter problems. ARGAs, which originally proposed by Arakawa and Hagiwara, 
are a quite unique approach to solve such problems efficiently [9-10]. Oyama 
developed real-coded ARGAs and applied them to the transonic wing optimization 

[4]. 
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2.1 Real-Coded Adaptive Range Genetic Algorithms 

The main difference between ARGAs and conventional GAs is the introduction of the 
range adaptation. The flowchart of ARGAs is shown in Fig. 1. Population is 
reinitialized every M generations for the range adaptation so that the population 
advances toward promising regions. Another difference is the elimination of the range 
limits because design variables are encoded into the normal distribution. 

In the real-coded ARGAs, the real value of i-th design variable P. is encoded to a 
real number r. defined in (0,1) so that r. is equal to the integrations of the normal 
distribution form to Pn., 



r=P N(Q,l){z)dz 


(1) 


P. = a. ■ pn. + jj,. 


(2) 



where the average p. and the standard deviation o_ of i-th design variable are 
calculated by sampling the top half of the previous population to promote the 
population toward search regions of high fitness. A schematic view of this coding is 
illustrated in Fig. 2. 




Fig. 1. Flowchart of ARGAs 
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Fig. 2. Normal distribution is used for encoding in ARGAs 



2.2 Extension of ARGAs to the Multiobjective Problem 

In this study, ARGAs have to deal with multiple Pareto solutions for the 
multiobjective optimization. The basis of ARMOGAs is the same as ARGAs, but a 
straightforward extension may cause a problem in the diversity of the population. To 
better preserve the diversity of solution candidates, the normal distribution for 
encoding is changed as shown in Fig. 3. The searching region is partitioned into three 
parts (i, ii, iii). The region i and iii make use of the same encoding method as ARGAs. 
In contrast, the region ii adopts the conventional real-number encoding method. 




Fig. 3. Normal distribution used in ARGAs’ encoding is extended to maintain the diversity of 
the population for ARMOGAs 
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3 Problem Definitions 



3.1 Objective Functions 

The objective functions used in this study can be stated as follows: 

1 . Drag coefficient at transonic cruise, Cp) t 

2. Drag coefficient at supersonic cruise, Cp) ^ 

3. Bending moment at the wing root at supersonic cruise condition. Mg 

4. Twisting moment at supersonic cruise condition, 

In the present optimization, all four objective functions are to be minimized. Both 
the transonic and supersonic drag coefficients are evaluated by a Navier-Stokes 
solver. Both the bending and twisting moments are calculated by directly integrating 
the computed pressure load at the supersonic condition. The bending moment 
represents the lateral moment that acts at the wing root. The twisting moment is the 
pitching moment measured at the leading edge of the root along the line normal to the 
root. The present optimization is performed at two design points for the transonic and 
supersonic cruises. Each flow conditions and the target lift coefficients are described 
below. 

1. Transonic cruising Mach number, M„o,t=0.9 

2. Supersonic cruising Mach number, M„o,s=2.0 

3. Target lift coefficient at transonic cruising condition, CL,t=015 

4. Target lift coefficient at supersonic cruising condition, Cl,s=0.10 

5. Reynolds number based on the root chord length at both conditions, Re=1.0xl0’ 

To maintain constant lift constraints, the angle of attack is predicted by using Cpa 
obtained from the finite difference. Thus, three Navier-Stokes computations per 
evaluation are required. 



3.2 Design Parameters 

Design variables are categorized to planform, warp shape and the thickness 
distribution. The definitions of design parameters are same as the previous 
optimization except for the thickness distribution. As mentioned earlier, the previous 
thickness definition has the lack of smoothness at the maximum thickness as shown in 
Fig. 4 [11]. To improve it, two more control points, which are symmetrical with 
respect to maximum thickness location, are added as shown in Fig. 5. Therefore, the 
present definition makes the second derivative continuous at the maximum thickness. 
As a result, 11 control points are used to represent the thickness distribution by a 
Bezier curve at three spanwise sections (root, kink and tip). Finear interpolation is 
used to interpolate the thickness distribution in spanwise direction. Table 1 describes 
the constraints for the thickness definition. 

The wing planform is determined by six design variables as shown in Fig. 6. Since 
the wing area is fixed, the chord length at the wing tip is determined automatically. 
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Constraints and the range of design variables are described in Tab. 2. The warp shape 
is composed of camber and twist. The camber surface is defined from the airfoil 
camber lines at the inboard and outboard of the wing separately. Each surface is 
represented by the Bezier surface which has four polygons in the chordwise direction 
and three in the spanwise direction. In case of the wing twist, a B-spline curve with 
six polygons is used. The total number of design parameters becomes 72. 



t/c t/c 




Table 1. Constraints for thickness distribution 



Maximum thickness 
Maximum thickness location 
First derivative constant at P 5 
Second derivative constant at P 5 
First derivative constant at leading edge 



3 < Zp, < 4 
15<Xp,<70 
ZP4 = ZP 3 = Zp, 

XP 5 -XP 3 = Xp -XP 3 , ZP 3 = Zp, 
Xp„ = Xp, 



Table 2. Constraints for planform shape 



Chord length at root 


10<C„„,<20 


Chord length at kink 


3 < < 15 


Inboard span length 


2 < b, < 7 


Outboard span length 


2<\<1 


Inboard sweep angle (deg) 


35 < a, < 70 


Outboard sweep angle (deg) 


35 < a, < 70 


Wing area 


S = 60 
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Fig. 6. Wing planform is defined by six design variables. Schematic view of bending and 
twisting moments are also shown 



3.3 Evaluation by CFD 

Previous results showed the importance of the viscous effect for wing designs. Thus, 
the three-dimensional, compressible, thin layer Navier-Stokes code is again used to 
evaluate aerodynamic performance of a three-dimensional wing at both transonic and 
supersonic conditions. This Navier-Stokes code employs total-variation-diminishing 
type upwind differencing and the lower-upper factored symmetric Gauss-Seidel 
scheme [12]. An algebraic mixing length turbulence model by Baldwin and Lomax is 
adopted [13]. To accelerate the convergence, the multigrid method is also used [14]. 

Taking advantage of the characteristic of GAs, the present optimization is 
parallelized on SGI ORIGIN 2000 at the Institute of Fluid Science, Tohoku 
University. The system has 640 PE’s with peak performance of 384 GFLOPS and 640 
GB of memory. The master PE manages the optimization process, while the slave 
PE’s compute the Navier-Stokes code. The population size used in this study was set 
to 64 so that the process was parallelized with 32-128 PE’s depending on the 
availability, because the transonic and supersonic computations can be processed 
separately. It should be noted that the parallelization was almost 100% because almost 
all the CPU time was dominated by Navier-Stokes computations. The present 
optimization requires about six hours per each generation parallelized on 128 PEs. 



3.4 Details of the Present ARMOGA 

In this study, the design variables are encoded in the real numbers. Blended crossover 
(BLX-a) is adopted as a crossover operator. This crossover method produces children 
on a segment defined by two parents and user specified parameter a. Parameter a is 
set to 0.5 except for the planform definition design variables. In the case of the six 
planform design variables, a is set to 0.0 to prevent the computational divergence of 
new candidates. After the crossover, mutation takes place at a probability of 20% 
based on a uniform random number selected over 10% of the range. 

Selection is based on the Pareto ranking method and fitness sharing. Each 
individual is assigned to its rank according to the number of individuals that dominate 
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it. A standard fitness sharing function is used to maintain the diversity of the 
population. The so-called best-A selection is also employed. 

A population is set to 64, and the range adaptation is performed every 10 
generations starting from the 15“’ generation. 

4 Optimization Results 



4.1 Overview of Pareto Solutions 

The evolution was computed for 75 generations. After the computation, all the 
solutions evolved were sorted again to find the final Pareto solutions. The Pareto 
solutions were obtained in four-dimensional objective function space. To understand 
the distribution of Pareto solutions, all Pareto solutions are projected into two- 
dimensional objective function space between transonic and supersonic drag 
coefficients as shown in Fig. 7. In this figure, the Pareto solutions obtained from the 
previous optimization with three design objectives are also plotted. The present Pareto 
front is larger than before, in particular, better tradeoff solutions appear in the tradeoff 
surface I. The planform shapes of the extreme Pareto solutions, which minimize 
respective objective functions, appear physically reasonable as shown in Fig. 8. A 
wing, which minimizes the transonic cruising drag, has a less leading-edge sweep and 
a large aspect ratio. On the contrary, a wing with the lowest supersonic drag 
coefficient has a large leading-edge sweep to remain inside the Mach cone. 



4.2 Influences of the Bending and Twisting Moments to Drag Coefficients 

To examine influences of the bending and twisting moments, all the present Pareto 
solutions in Fig. 7 are labeled by the bending and twisting moments, respectively, as 
shown in Fig. 9. The wings, which locate near the tradeoff surface between transonic 
and supersonic drag coefficients (tradeoff surface I, Fig. 7), have impractically large 
bending moments as shown in Fig. 9 (a). The bending moment is closely related to 
both transonic and supersonic drag coefficients. On the other hand, the twisting 
moment has an influence only on supersonic drag coefficient. As a consequence, the 
region II in Fig. 7 was primarily corresponding the minimization of the bending 
moment, not to the new objective function of the twisting moment minimization. The 
planform shapes, which have the lowest bending moment obtain/ed by the present and 
previous optimization respectively, are plotted in Fig. 10. Since these planform shapes 
are supposed to be indifferent, the present minimum wing and the wings belonged to 
the region II are found thanks to ARMOGA. Similarly, the improvement of the 
present tradeoff surface I from the previous result (Fig. 7) is due to ARMOGA. 

Pareto solutions are also projected into the two-dimensional plane with the 
supersonic drag coefficient and the twisting moment in Fig. 11. A clear tradeoff is 
found. Figure 1 1 is also labeled by aspect ratios but there is no trend in performance 
based on the aspect ratios. 
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4.3 Comparison with NAL’s Second Design and the Previous Design 

To examine the quality of the present Pareto solutions, two wings are compared with 
NAL’s second design wing as well as the previous wing obtained by three-objective 
optimization. NAL SST Design Team already finished the fourth aerodynamic design 
for the scaled experimental supersonic airplane to be launched in 2002 (NEXST-I). 
To summarize their concepts briefly, the first design determined the planform shapes 
among 99 candidates, then the second design was performed by the warp optimization 
based on the linearized theory. The third design aimed a natural-laminar-flow (NLF) 
wing by an inverse method using a Navier-Stokes code. Finally, the fourth design was 
performed for a wing-fuselage configuration. Because a fully developed turbulence is 
assumed in the present Navier-Stokes computations, it is improper to compare the 
present Pareto solutions to NAL’s NLF wing design. Therefore, the NAL second 
design is chosen for a comparison. 

Table 3 summarizes the aerodynamic performances of four wings compared: two 
present Pareto solutions (A, B), a previous Pareto solution (OBJ-3) and NAL’s second 
design. The aerodynamic calculation of NAL’s and the previous design is performed 
by using the same Navier-Stokes solver. All three Pareto solutions are superior to 
NAL’s second design in all four objectives. The wing planform shapes are compared 
as shown in Fig. 12. The present and the previous planform shapes are similar to the 
“arrow wing” type. On the other hand, NAL’s planform is similar to the conventional 
“delta wing” planform. These results indicate that the present arrow wing doesn’t 
necessarily have a large pitching moment because NAL’s design has a higher pitching 
moment. 

The thickness distributions at the wing root of three Pareto solutions (A, B, OBJ-3) 
are presented in Fig. 13. In this figure, Pareto solutions A and B have much smoother 
thickness distributions than a previous Pareto solution of OBJ-3. The present wings 
do not have a kink in the thickness distribution thanks to the improved 
parameterization, and less likely to cause boundary layer separation. 



Table 3. Aerodynamic performances of selected four wings 





c„, 




Mb 




Pareto (A) 


0.00998863 


0.01085439 


18.15 


62.35 


Pareto (B) 


0.01007195 


0.01093646 


17.39 


60.60 


OBJ-3 


0.01004036 


0.01093742 


18.21 


61.00 


NAL2nd 


0.01010175 


0.01097646 


18.23 


63.31 



5 Conclusion 

Four-objective aerodynamic optimization of the wings for SST was performed by 
ARMOGA. In addition to the previous objective functions, which are to minimize the 
transonic and supersonic drag coefficients and the bending moment at the wing root, 
the minimization of the twisting moment is added. The number of design variables is 
increased from 66 to 72 to improve the thickness distribution. A Navier-Stokes solver 
is used to evaluate the aerodynamic performances. 
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Fig. 7. Projection of present 4-objective Pareto front to transonic and supersonic drag tradeoffs. 
Pareto solutions obtained by previous 3-objective optimization are also plotted here. Extreme 
Pareto solutions are indicated. A previous Pareto solutions with the minimum bending moment 
is also indicated 




Fig. 8. Typical planform shapes of the extreme Pareto solutions 
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Fig. 9. Projection of Pareto front to transonic and supersonic drag tradeoffs labeled by bending 
and twisting moments 




Fig. 10. Comparison of planform shapes having lowest bending moment obtained by the 
present and previous optimizations 

As a result of the optimization, reasonable Pareto solutions were successfully 
obtained. The planform configurations of the extreme Pareto solutions are found 
physically reasonable. The resulting Pareto front appeared better than the previous 
case thanks to the range adaptation. ARMOGA is confirmed to work well in a large 
search space. By improving the definition of the thickness distributions, more realistic 
thickness distributions are obtained. 

The present Pareto solutions, which are superior to NAL’s second design in all four 
objective functions, are compared with NAL’s wing and an optimal wing obtained 
before. As for the planform, optimal wings are similar to the “arrow wing” type. On 
the other hand, the NAL’s design is similar to the conventional “delta wing” type. It 
also shows that even the arrow wing can reduce the pitching moment below that of 
the NAL second design. The present arrow wing is a good design candidate for the 
next-generation SST. 
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Fig. 12. Comparison of planform shapes among selected Pareto solutions and NAL’s design. 
Planform shapes of the present (A, B) and previous results (OBJ-3) are similar to the “arrow 
wing” 
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x/c 

Fig. 13. Comparison of thickness distributions at the wing root among selected Pai'eto 
solutions. Thickness distributions of Pareto solutions (A, B) are much smoother at the 
maximum thickness location than that of the previous result (OBJ-3) is 
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Abstract. Evolutionary algorithms to design fuzzy rules from data 
for systems modeling have received much attention in recent literature. 
Many approaches are able to hnd highly accurate fuzzy models. How- 
ever, these models often contain many rules and are not transparent. 
Therefore, we propose several objectives dealing with transparency and 
compactness besides the standard accuracy objective. These objectives 
are used to find multiple Pareto-optimal solutions with a multi-objective 
evolutionary algorithm in a single run. Attractive models with respect 
to compactness, transparency and accuracy are the result. 

Keywords: Takagi-Sugeno fuzzy model, Pareto optimality, multi- 
objective evolutionary algorithm. 



1 Introduction 

This paper deals with fuzzy model parameter estimation and structure selection. 
In fuzzy model identification, we can, in general, take into account three criteria 
to be optimized: compactness, transparency and accuracy. Different measures 
for these criteria are proposed here. Compactness is related to the size of the 
model, i.e. the number of rules, the number of fuzzy sets and the number of 
inputs for each rule. Transparency is related to linguistic interpretability P12j 
and locality of the rules. Often one is interested in the local behavior of the 
global nonlinear model. Such information can be obtained by constraining the 
model-structure during identification. Transparency and model interpretability 
for data-based fuzzy models received a lot of interest in recent literature 

u 

Evolutionary Algorithms (EA) [VIS) have been recognized as appropriate 
techniques for multi-objective optimization because they perform a search for 
multiple solutions in parallel mill. EAs have been applied to learn both the 
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antecedent and consequent part of fuzzy rules, and models with both fixed and 
varying number of rules have been considered nan]. Also, EAs have been com- 
bined with other techniques like fuzzy clustering and neural networks 

pi7ll8| . This has resulted in many complex algorithms and, as recognized in P 
and 0 , often the transparency and compactness of the resulting rule base is not 
considered to be of importance. In such cases, the fuzzy model becomes a black- 
box, and one can question the rationale for applying fuzzy modeling instead 
of other techniques like, e.g., neural networks. If the fuzzy model or a neural 
network is handled as a black-box model it will typically store the information 
in a distributed manner among the neurons or fuzzy sets and their associated 
connectivity nni. 

Most evolutionary approaches to multi-objective fuzzy modeling consist of 
multiple EAs, usually designed to achieve a single task each, which are ap- 
plied sequentially to obtain a final solution. In these cases each EA optimizes 
the problem attending to one criterion separately which is an impediment for 
the global search. Simultaneous optimization of all criteria is more appropri- 
ate. Other approaches are based on classical multi-objective techniques in which 
multiple objectives are aggregated into a single function to be optimized [IS|. In 
this way a single EA obtains a single compromise solution. Current evolutionary 
approaches for multi-objective optimization consist of a single multi-objective 
EA, based on the Pareto optimality notion, in which all objectives are optimized 
simultaneously to find multiple non-dominated solutions in a single run of the 
EA. These approaches can also be considered from the fuzzy modeling perspec- 
tive m- The advantage of the classical approach is that no further interaction 
with the decision maker is required, however it is often difficult to define a good 
aggregation function. If the final solution cannot be accepted, new runs of the 
EA may be required until a satisfying solution is found. The advantages of the 
pareto approach are that no aggregation function has to be defined, and the 
decision maker can choose the most appropriate solution according to the cur- 
rent decision environment at the end of the EA run. Moreover, if the decision 
environment changes, it is not always necessary to run the EA again. Another 
solution may be chosen out of the family of non-dominated solutions that has 
already been obtained. 

In this paper we propose a single multi-objective EA to find, with a low 
necessity for human intervention, multiple non-dominated solutions for fuzzy 
modeling problems. In section 2, fuzzy modeling and the criteria taken into 
account, are discussed. The main components of the multi-objective EA are 
described in section 4. Section 5 proposes several optimization models for fuzzy 
modeling and a decision making strategy. In section 6, experiments with the EA 
for a test problem are shown and compared with results in literature. Section 6 
concludes the paper and indicates lines for future research. 
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2 Fuzzy Model Identification 



2.1 Fuzzy Model Structure 

We consider rule-based models of the Takagi-Sugeno (TS) type m which are 
especially suitable for the approximation of dynamic systems. The rule conse- 
quents are often taken to be linear functions of the inputs: 



i?, :If xi is Ail and . . . is Ain then (1) 

Vi — Ctl^l -f . . . , T Ct(^+t) ’ t = 1, . . . , ill 

Here x = [a;i,a; 2 , . . . ,Xn]’^ is the input vector, iji is the output of the ith rule, 
Aij (j = 1 , . . . , n) are fuzzy sets defined in the antecedent space by membership 
functions /lAy : l]i Cij ^ R (j = 1: • • • j ^ + 1) are the consequent parame- 

ters, and M is the number of rules. The total output of the model is computed 
by aggregating the individual contributions of the rules: 

M 

y = ^Pi{'^)m (2) 

2=1 



where Pi(x) is the normalized firing strength of the ith rule: 



li(x) 






( 3 ) 



We apply the frequently used trapezoidal membership functions to describe the 
fuzzy sets Aij in the rule antecedents: 



/XA,, (x) = max 



^0, min 



X — Uij 
^ij ^ij 



1 , 



Cij X 




( 4 ) 



2.2 Multi-objective Identification 

Identification of fuzzy models from data requires the presence of multiple criteria 
in the search process. In multi-objective optimization, the set of solutions is 
composed of all those elements of the search space for which the corresponding 
objective vector cannot be improved in any dimension without degradation in 
another dimension. These solutions are called non- dominated or Pareto -optimal. 
Given two decision vectors a and b in a universe U, a is said to dominate b if 
fi{o) < fi{b), for all objective functions fi, and fj{a) < fj{b), for at least one 
objective function fj, for minimization. A decision vector a £ [/ is said to be 
Pareto -optimal if no other decision vector dominates a. 

The Pareto-optimality concept should be integrated within a decision process 
in order to select a suitable compromise solution from all non-dominated alter- 
natives. In a decision process, the decision maker expresses preferences which 
should be taken into account to identify preferable non-domination solutions. In 
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this way, preference articulation implicitly defines a utility function which dis- 
criminates between candidate solutions. Approaches based on weights, goals and 
priorities have been used more often. Moreover, preference articulation can be 
achieved in different ways depending on how the computation and the decision 
processes are combined in the search for compromise solutions. Three broad 
classes can be identified, a priori, a posteriori, and progressive articulation of 
preferences. 

2.3 Rule Set Simplification Techniques 

Automated approached to fuzzy modeling often introduce redundancy in terms 
of several similar fuzzy sets that describe almost the same region in the domain of 
some variable. According to some similarity measure, two or more similar fuzzy 
sets can be merged to create a new fuzzy set representative for the merged sets 
m- This new fuzzy set substitutes the ones merged in the rule base. The merging 
process is repeated until fuzzy sets for each model variable cannot be merged, 
i.e., they are not similar. This simplification may results in several identical rules, 
which are removed from the rule set. 

We consider the following similarity measure between two fuzzy sets A and 



If ^(A, B) > 6s (we use 9s = 0.6) then fuzzy sets A and B are merged in a 
new fuzzy set C as follows: 



where a € [0,1] determine the influence of A and B on the new fuzzy set C. 

3 Criteria for Fuzzy Modeling 

We consider three main criteria to search for an acceptable fuzzy model: (i) 
accuracy, (ii) transparency, and (iii) compactness. It is necessary to define quan- 
titative measures for these criteria by means of appropriate objective functions 
which define the complete fuzzy model identification. 

The accuracy of a model can be measured with the mean squared error. 



B: 




(5) 



ac = min{aA,aB} 
be = abA + (1 - a)bB 
cc = acA + (1 - a)cB 
dc = max{dA, ds} 



( 6 ) 




(7) 



where yk is the true output and yk is the model output for the kth input vector, 
respectively, and K is the number of data samples. 
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Many measures are possible for the second criterion, transparency. Neverthe- 
less, in this paper we only consider one of most significant, similarity, as a first 
starting point. The similarity S among distinct fuzzy sets in each variable of the 
fuzzy model can be expressed as follows: 



S= max i = I, ■ . ■ ,n, j = 1, . . . ,M, k = 1, . . . ,M (8) 

3, k 

^ ^ik 

This is an aggregated similarity measure for the fuzzy rule-based model with 
the objective to minimize the maximum similarity between the fuzzy sets in each 
input domain. 

Finally, measures for the third criterion, the compactness, are the number of 
rules M and the number of different fuzzy sets L of the fuzzy model. We assume 
that models with a small number of rules and fuzzy sets are compact. 

In summary, we have considered three criteria for fuzzy modeling, and we 
have defined the following measures for these criteria: 



Criteria 


Measures 


Accuracy 


MSB 


Transparency 


S 


Compactness 


M, L 



4 Multi-objective Evolutionary Algorithm 

The main characteristics of the Multi-Objective Evolutionary Algorithm are the 
following: 

1. The proposed algorithm is a Pareto-based multi-objective EA for fuzzy 
modeling, i.e., it has been designed to find, in a single run, multiple non- 
dominated solutions according to the Pareto decision strategy. There is no 
dependence between the objective functions and the design of the EA, thus, 
any objective function can easily be incorporated. Without loss of generality, 
the EA minimizes all objective functions. 

2. Constraints with respect to the fuzzy model structure are satisfied by incor- 
porating specific knowledge about the problem. The initialization procedure 
and variation operators always generate individuals that satisfy these con- 
straints. 

3. The EA has a variable-length, real-coded representation. Each individual of 
a population contains a variable number of rules between 1 and max, where 
max is defined by a decision maker. Fuzzy numbers in the antecedents and 
the parameters in the consequent are coded by floating-point numbers. 

4. The initial population is generated randomly with a uniform distribution 
within the boundaries of the search space, defined by the learning data and 
model constraints. 
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5. The EA search for among simplified rule sets, i.e, all individuals in the pop- 
ulation has been previously simplified (after initialization and variation), 
which is an added ad hoc technique for transparency and compactness. So, 
all individuals in the population have a similarity S between 0 and 0.6. 

6. Chromosome selection and replacement are achieved by means of a variant of 
the preselection scheme. This technique is, implicitly, a niche formation tech- 
nique and an elitist strategy. Moreover, an explicit niche formation technique 
has been added to maintain diversity respect to the number of rules of the 
individuals. Survival of individuals is always based on the Pareto concept. 

7. The EAs variation operators affect at the individuals at different levels: (i) 
the rule set level, (ii) the rule level, and (iii) the parameter level. 



4.1 Representation of Solutions and Constraint Satisfaction 

An individual I for this problem is a rule set of M rules as follows: 

Rl : All ■ ■ ■ Ain Cll ■ • ■ Cln Cl(n-l-l) 

Rm ■ Ami ■ ■ ■ Amu Cmi • • ■ Cm« CM(n+i) 

The constraints on the domain of the variables for a fuzzy model come given 
by the semantic of a fuzzy number. Thus, a fuzzy number Aij (i = 1, . . . , M, 
j = 1, . . . , n) can be represented by means of four real values aij,bij, Cij,dij G 
[lj,Uj], with Gij < bij < Cij < dij. The consequent parameters are also real 
values constrained by a domain, i.e. (^ij G [l,u] = j = l,...,n + l). 

Other constraint are related with the number of rules M of the model, which 
can be defined between a lower number 1 and a upper number max fixed by the 
decision maker. 

In the following sections we describe easy initialization and variation proce- 
dures to generate random individuals which satisfy these constraints. 



4.2 Initial Population 

Initial population is completely random, except that the number of individuals 
with M rules, for all M G [l,maa:], should be between minNS and maxNS 
to ensure diversity respect to the number of rules, where minNS and maxNS, 
with 0 < minNS < < maxNS < PS {PS is the population size), are the 

minimum and maximum niche size respectively (see next subsection). 

To generate an individual with M rules, the procedure is as follows: for each 
trapezoidal fuzzy number Aij {i = 1, . . . , M, j = 1, ... ,n), four random real 
values from [lj,Uj] are generated and sorted to satisfy the constraints < 
bij < Cij < dij. Parameters Qj {i = 1, . . . , M, j = l,...,n-|-l) are real values 
generated at random from [l,u\. After, the individual is simplified according to 
the procedure described in a previous section. 
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4.3 Selection and Generational Replacement 

We use a variant of the preselection scheme 0 which has been one of the results 
of previous works for general constrained multi-objective optimization problems 
by EA 

In each iteration of the EA, two individuals are picked at random from the 
population. These individuals are crossed nChildren times and children mutated 
producing 2 • nChildren offspring. After, the best of the first offspring replaces 
the first parent, and the best of the second offspring replaces to the second parent 
only if: 

- the offspring is better than the parent, and 

- the number of rules of the offspring is equal to the number of rules of the 

parent, or the niche count of the parent is greater than minNS and the 

niche count of the offspring is smaller than maxNS 

An individual I is better than another individual J if / dominates J. The 
best individual of a collection is any individual I such that there is no other 
individual J which dominates I . The niche count of an individual I is the number 
of individuals in the population with the same number of rules as I. 

Note that the preselection scheme is an implicit niche formation technique to 
maintain diversity in the populations because an offspring replaces an individual 
similar to itself (one of their parents). Implicit niche formation techniques are 
more appropriate for fuzzy modeling than explicit techniques, such as sharing 
function, which can provoke an excessive computational time. However, we need 
an additional mechanism for diversity with respect to the number of rules of the 
individuals in the population. One of the reasons is that the number of rules is 
an integer parameter and the variation operators can generate individuals with 
quite different numbers of rules of the parents. The preselection scheme is not 
effective in such a case. The added explicit niche formation technique ensures 
that the number of individuals with M rules, for all M G [l,maa:], is greater 
or equal to minNS and smaller or equal to maxNS. Moreover, the preselection 
scheme is also an elitist strategy because the best individual in the population 
is replaced only by a better one. 

4.4 Variation Operators 

As already said, an individual is a set of M rules. A rule is a collection of 
n fuzzy numbers (antecedent) plus n + 1 real parameters (consequent), and a 
fuzzy number is composed of four real numbers. In order to achieve an appropri- 
ate exploitation and exploration of the potential solutions in the search space, 
variation operators working in the different levels of the individuals are neces- 
sary. In this way, we consider three levels of variation operators: rule set level, 
rule level, and parameter level. After a sequence of crossovers and mutations, 
the offspring are simplified according to the rule set simplication procedure as 
described previously. 

Five crossover and four mutation operators are used in the EA. In the fol- 
lowing, a € [0, 1] is a random number from a uniform distribution. 
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Rule Set Level Variation Operators 

— Crossover 1: Given two parents I\ = {R\ . . . R\j^) and I2 = {R\ ■ ■ ■ R^i^), 

this operator exchanges information about the number of rules of the parents 
and information about the rules of the parents, but no rule is internally 
crossed. Two children are produced: I3 = {R\ . . . R\R\ . . . Rf) and 1 ^ = 
{R\+i ■ ■ ■ ■ ■ ■ R'm2)^ where a = round{a ■ Mi + (1 — a) • M2) and 

b = round{{l — a) ■ Mi + a • M2). The number of rules of the children is 
between Mi and M2. 

— Crossover 2: This operator increases the number of rules of the two chil- 
dren as follows: the first child contains all Mi rules of the first parent and 
minjmax — Ml, M2} rules of the second parent; the second child contains 
all M2 rules of the second parent and minjmax — M2, Mij rules of the first 
parent. 

— Mutation 1: This operator deletes or adds, both with equal probability, 
one rule in the rule set. For deletion, one rule is randomly deleted from the 
rule set. For rule-addition, one rule is randomly generated, according to the 
initialization procedure described, and added to the rule set. 



Rule Level Variation Operators 

— Crossover 3 : Given two parents Ii = {R\ . . . Rj . . . R\^^) and I2 = (i?i 

. . . . . . R\i^), this operator produces two children = {R \ . . . . . . R\^^ ) 

and I4 = (i?i . . .Rj . . . R\j^), with = aR\ + (1 — and -I- 

(1— where i, j are random indexes from [1, Mi] and [1, M2] respectively. 

— Crossover 4 : Given two parents Ii = {R\ . . . Rj . . . R\^^) and I2 = {R\ 
. . ,R‘j . . . R\i^), this operator produce two children = {R\. . .R^ . . . R \^^ ) 
and J4 = {R\. . .Rj . . . R\f^), where and Rj are obtained with the uni- 
form crossover. 

— Mutation 2 : This operator removes a randomly chosen rule and inserts a 
new one which is randomly generated by the rule-initialization procedure. 



Parameter Level Variation Operators 

— Crossover 5: Given two parents, and one rule of each parent randomly 
chosen, this operator crosses the fuzzy numbers corresponding to a random 
input variable or the consequent parameters. The crossover is arithmetic. 

— Mutation 3: This operator mutates a random fuzzy number or the conse- 
quent of a random rule. The new fuzzy number or consequent is generated 
at random. 

— Mutation 4: This operator changes the value of one of the antecedent fuzzy 
sets a, 6, c or d of a random fuzzy number, or a parameter of the consequent 

of a randomly chosen rule. The new value of the parameter is generated 
at random within the constraints by a non-uniform mutation. 
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5 Optimization Models and Decision Making 

After preliminary experiments in which we have checked different optimization 
models, the following remarks can be maded: 

1. The minimization of the number of rules M of the individuals has negative 
influence on the evolution of the algorithm. The reason is than this parameter 
is not an independent variable to optimize, as the amount of information in 
the population decreases when the average number of rules is low, which 
is not good for exploration. Then, we do not minimize the number of rules 
during the optimization, but we will take it into account at the end of the 
run, in a posteriori articulation of preferences applied to the last population. 

2. It is very important to note that a very transparent model will be not ac- 
cepted by a decision maker if the model is not accurate. In most fuzzy mod- 
eling problems, excessively low values for similarity hamper accuracy, for 
which these models are normally rejected. Alternative decision strategies, as 
goal programming, enable us to reduce the domain of the objective functions 
according to the preferences of a decision maker. Then, we can impose a goal 
gs for similarity, which stop minimization of the similarity in solutions for 
which goal gs has been reached. 

3. The measure L (number of different fuzzy sets) is considerably reduced by 
the rule set simplification technique. So, we do not define an explicit objective 
function to minimize L. 



According to the previous remarks, we finally consider the two following 
optimization models: 

Optimization Model 1: 

Minimize fi = MSE , , 

Minimize f^ = S ^ 

Optimization Model 2: 



Minimize fi = MSE 
Minimize /2 = max{gs, S'} 



(10) 



At the end of the run, we consider the following a posteriori articulation 
of preferences applied to the last population to obtain the final compromise 
solution: 



1. Identify the set X* = {x \, . . . , x*} of non-dominated solutions according to: 

Minimize fi = MSE 

Minimize f 2 = S (11) 

Minimize fs = M 

2. Choose from X* the most accurate solution x*; remove x* from X*; 

3. If solution X* is not accurate enough or there is no solution in the set X* 
then STOP (no solution satisfies); 
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4. If solution X* is not transparent or compact enough then go to step 2; 

5. Show the solution x* as output. 

Computer aided inspection shown in Figure El can help in decisions for steps 
2 and 3. 



6 Experiments and Results 

Consider the 2"'^ order nonlinear plant studied by Wang and Yen in 



y{k) = g{y{k-l),y{k-2)) + u{k) (12) 



with 



g{y{k-l),y{k-2)) 



y{k-l)y{k-2){y{k-l)-0.5) 

l + y2(fc_i)j/2(fc_2) 



(13) 



The goal is to approximate the nonlinear component g(y(k — l),y(fc — 2)) 
of the plant with a fuzzy model. As in El, 400 simulated data points were 
generated from the plant model dEJ. Starting from the equilibrium state (0,0), 
200 samples of identification data were obtained with a random input signal 
u{k) uniformly distributed in [—1.5, 1.5], followed by 200 samples of evaluation 
data obtained using a sinusoidal input signal u(k) = sin(27rfc/25). The resulting 
signals and the real surface are shown in Figure E 




k 



Fig. 1. Left'. Input u(k), unforced system g{k), and output y(k) of the plant in Hl^. 
Right: Real surface. 




The following values for the parameters of the EA were used in the simula- 
tions: population size 100, crossover probability 0.8, mutation probability 0.4, 
number of children for the preselection scheme 10, minimum number of individ- 
uals for each number of rules 5, and maximum number of individuals for each 
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number of rules 20. All crossover and mutation operators are applied with the 
same probability. The EA stops when the solutions satisfy the decisor maker. 

We show results obtained with the EA by using the optimization models m 
{max = 5) and dTHt {max = 5, gs = 0.25). Figure |2| shows the non-dominated 
solutions in the last population according to m for both optimization models 
m and cni). One can appreciate the effectiveness of the preselection technique 
and the added explicit niche formation technique to maintain diversity in the 
populations. The main differences between the optimization models (PI) and m 
are that with (0) the EA obtains more diversity but the fuzzy models are less 
accurate. Goal-based model have the disadvantage that it is necessary to choose, 
a priori, a good goal for the problem, although this value is representative of the 
maximum degree of overlapping of the fuzzy sets allowed by a decisor. 

According to the described decision process, we finally choose a compromise 
solution showed in Figure El by means of different graphics for the obtained 
model. Figure El)a) shows the local model. The surface generated by the model 
is shown in Figure OI)b), fuzzy sets for each variable are showed in Figure EIc), 
and finally, the identification and validation results as well as the prediction error 
are shown in Figure EJd). 

We compared our results, with those obtained by the four different ap- 
proaches proposed in ES] and m- The best results obtained for in each case 
are summarized in Table ^ with an indication of the number of rules, number 
of different fuzzy sets, consequent type, and obtained MSE for training and 
evaluation data. In m. the low MSE on the training data is in contrast with 
the MSE for the evaluation data which indicates overtraining. The solution in 
is similar to the solutions in this paper with respect to the accuracy, trans- 
parency and compactness, but hybrid techniques (initial fuzzy clustering and a 
sequence of specific genetic algorithms) were required in |26|. Solutions in this 
paper are obtained with a single EA and they have been chosen among different 
alternatives, which is an advantage for an appropriate decision process. 



Table 1. Fuzzy models for the dynamic plant. All models are of the Takagi-Sugeno 
type. 



Ref. 


No. of rules 


No. of sets 


Consequent 


MSE train 


MSE eval 


m 


36 rules (initial) 

24 rules (optimized) 


12 (R-splines) 


Linear 

Linear 


1.9 • 10“'^ 
2.0 • 10“® 


2.9 • 10"^ 
6.4- lO"'^ 


m 


7 rules (initial) 

5 rules (optimized) 


14 (triangular) 
5 (triangular) 


Linear 

Linear 


1.8 • 10“^ 
5.0- 10“'‘ 


1.0 • 10”^ 
4.2 • lO"'^ 


This paper^ 
This paper^ 


5 rules 
5 rules 


5 (trapezoidal) 

6 (trapezoidal) 


Linear 

Linear 


2.0 • 10“^ 
5.9- 10-* 


1.3 • 10"^ 
8.8- lO"'^ 



^ Solution corresponds to the solution marked with * in Figure 01eft), and Figure 

EDleft) 

^ Solution corresponds to the solution marked with * in Figure Elright), and Figure 

Elright) 
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Mean squared error (MSE) 



Mean squared error (MSE) 



Fig. 2. Left: Non-dominated solutions according to ill ill obtained with the Pareto- 
based multi-objective EA by using the optimization model (HI- Right: Non-dominated 
solutions according to 1 1 I f obtained with the Pareto-based multi-objective EA by using 
the optimization model ()1 OB . Solution marked with * is the final compromise solution. 



7 Conclusions and Future Research 

This paper remarks some initial results in the combination of Pareto-based multi- 
objective evolutionary algorithms and fuzzy modeling. Criteria such as accuracy, 
transparency and compactness have been taken into account in the optimization 
process. Some of these criteria have been partially incorporated into the EA by 
means of ad hoc techniques, such as rule set simplification techniques. An im- 
plicit niche formation technique (preselection) in combination with other explicit 
techniques with low computational costs have been used to maintain diversity. 
These niche formation techniques are appropriate in fuzzy modeling if excessive 
amount of data are required. Excessive computational times would result if shar- 
ing function were used. Elitism is also implemented by means of the preselection 
technique. A goal based approach has been proposed to help to obtain more ac- 
curate fuzzy models. Results obtained are good in comparison with other more 
complex techniques reported in literature, with the advantage that the proposed 
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(a) Local model 




(b) Surface 






(d) Identification and validation, and error prediction 



Fig. 3. Accurate, transparent and compact fuzzy models for the plant model dI2J. Left: 
Final compromise solution obtained with the Pareto-based multi-objective EA by using 
the optimization model @ and a posteriori decision process. Right: Final compromise 
solution obtained with the Pareto-based multi-objective EA by using the optimization 
model (HU and a posteriori decision process. 
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technique identifies a set of alternative solutions. We also proposed an easy de- 
cision process with a posteriori articulation of preferences to choose finally a 
compromise solution. 

One of the main differences between the proposed EA and other approaches 
for fuzzy modeling is the reduced complexity because we use a single EA for 
generating, tuning and simplification processes. Moreover, human intervention 
is only required at the end of the run to choose one of the multiple non-dominated 
solutions found by the EA. 

In our future works we will consider other and more complex fuzzy modeling 
test problems in order to check the robustness of the EA, other measures to opti- 
mize transparency, e.g., similarity in the consequent domain instead or together 
with of the antecedent domain, scalability of the algorithm, and applications in 
the real word by means of research projects. 
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Abstract. A multi-objective evolutionary algorithm is used to deter- 
mine the membership function distribution within the outer loop control 
system of a non-linear missile autopilot using lateral acceleration con- 
trol. This produces a design that meets objectives related to closed loop 
performance such as: steady state error, overshoot, settling and rising 
time. The evolutionary algorithm uses non-dominated sorting for form- 
ing a Pareto front of possible solutions. This paper shows that fuzzy 
controllers can be produced for engineering problems, with the multi- 
objective algorithm allowing the designer the freedom to choose solutions 
and investigate the properties of very complex systems. 



1 Introduction 

One of the main requirements for an autopilot design is to yield as fast as pos- 
sible response with the minimum of overshoot so that any command is attained 
quickly and is of the required magnitude. For low g demands only a slight over- 
shoot of short duration is usually acceptable, since it can compensate for loss 
of acceleration during the initial transient. For high g demands, overshoot is 
usually unacceptable since the airframe structural load limit may be exceeded. 

In order that the autopilot yields an accurate and instantaneous reaction, 
it is very important to assess the quality of the lateral acceleration response 
which is quantified in terms of rise time, settling time, maximum percentage 
overshoot with almost no steady state error. This means that while tuning the 
trajectory control parameters, the optimisation process should consider those 
four criteria simultaneously. Hence the single optimisation problem has become 
multi-objective, being able to provide the designer with multiple solutions. The 
four criteria are conflicting in their nature and a compromise solution may well 
be taken. 

The aim of this paper is to track the missile lateral acceleration demand in 
the presence of uncertainties introduced through the aerodynamic coefflcients. 
The g demands are considered for both pitch and yaw planes, using the missile 
rudder and elevator as control surfaces hence yielding a system with 2 inputs 
and 2 controlled outputs. 
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Feedback linearisation of the autopilot is combined with a fuzzy logic trajec- 
tory controller to provide control of the missile lateral acceleration. A multi- 
objective evolutionary algorithm based on non-dominated sorting is used to 
evolve the fuzzy controllers. In previous research m a combination of an in- 
put/output linearisation technique (nonlinear control law) and a fuzzy logic tra- 
jectory controller have been considered for velocity control. This paper extends 
the techniques to direct control of acceleration. 

2 Fuzzy Trajectory Controller 

The autopilot design system shown in Fig. [D consists of the missile model and 
autopilot simulation feeding the performance analysis and objective generation 
functions, allowing the evolutionary algorithm to optimise the fuzzy control sur- 
faces. 




The missile model is surrounded by an inner feed-back linearisation loop, 
with outer loops containing the fuzzy trajectory controllers for the pitch and 
yaw channels. The fuzzy controllers for pitch and yaw are identical as the system 
is symmetrical, therefore only one fuzzy surface needs to be evolved. 

The missile dynamics are presented by 

X = fix) + Af{x) + (g{x) + Ag{x))u (1) 

y = h{x) 

which presents the multi-modelling frame. A fast 250 [rads/sec] second order 
linear actuator is included within the missile dynamics. Fin angles and fin rates 
are states of the system. The non-linear control law is derived by the 

feedback linearisation technique, which decouples the system. 
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The trajectory controller has been designed, based on a fuzzy inference en- 
gine, as a two input - one output system with five membership functions for each 
variable. The membership functions’ positions and the rules are generated using 
an evolutionary algorithm. 

3 Missile Model 

The missile model used in this study derives from a non-linear model produced 
by Horton of Matra-British Aerospace 0. It describes a 5 DOF model in para- 
metric format with severe cross-coupling and non-linear behaviour. This study 
has considered the reduced problem of a 4 DOF controller for the pitch and yaw 
planes without roll coupling. The angular and translational equations of motion 
of the missile airframe are given by: 

r = ^lyz pVoSd{^dCnrr + CnvV + VoCncO 

V = ^pVoS{CyvV + VoCyc^Q - Ur (2) 

Q — P^o^di^—dCrnqQ 

W = ^pVoS{Cz:wW + VoC^yp) +Uq (3) 

2m 

where the axes(a:, y, z), rates(r, q) and velocities (v,w) are defined in Fig. |21 




Fig. 2. Airframe axes 



Equations 121 describe the dynamics of the body rates and velocities under 
the influence of external forces (e.g. C^w) and moments (e.g. Cmg), acting on the 
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frame. These forces and moments, derived from wind tunnel measurements, are 
non-linear functions of Mach number longitudinal and lateral velocities, control 
surface deflection, aerodynamic roll angle and body rates. 

The description of the model is obtained from data supplied by Matra-BAE 
and detailed in the Horton report 0. As both horizontal and vertical lateral 
motion is symmetric in format, both will be dealt with together, taking into 
account the appropriate sign changes in derivatives for each lateral direction. 

It has been shown |5| that the desired tracking performance for lateral ac- 
celeration can be obtained by assuming an exact knowledge of aerodynamic 
coefflcients and missile configuration parameters (i.e., reference area, Mach num- 
ber, mass, moment of inertia). In practice however, this assumption is not valid 
and also, if there are parameter variations or external disturbances, feedback- 
linearisation can no longer guarantee the desired performance or neither is ro- 
bustness guaranteed. For these reasons, a combination of an input/output lineari- 
sation technique (nonlinear control law) and a fuzzy logic controller (trajectory 
controller) have been chosen to be considered here. 



4 Evolutionary Algorithm Structure 

The proposed framework maintains a population of fuzzy rule sets with their 
membership functions and uses the evolutionary algorithm to automatically de- 
rive the resulting fuzzy knowledge base. 

A hybrid real valued/binary chromosome has been used to define each indi- 
vidual fuzzy system. Figure 0 shows the chromosome structure for a five mem- 
bership function system. The real valued parameters are defined as being the 
[SaSb6cSd\ and lie in the range (0, 1]. The binary component encodes the set of 
rules used in the system. Each rule is either on or off (0/1) and corresponds to 
the form: 

if Ai AND Bj then Ok (4) 

where Ai denotes membership function i of input A, Bj denotes membership 
function j of input b, and Ok denotes membership function k of the output O. 
This process allows a full set of rules to be developed for the fuzzy system, but 
maintains a fixed length chromosome. The five membership function structure 
leads to a chromosome with 12 real valued genes and 125 binary genes. The fuzzy 
system used product for the member function ‘AND’. The ‘OR’ function was not 
required as the rules were all expressed as ‘AND’ terms. The implication method 
was to choose the minimum value and crop the output member functions. The 
aggregation method was to choose the maximum values of the set of member 
functions. A centroid approach was used to defuzzify the output. 

The evolutionary algorithm|0| follows the usual format of ranking, selection, 
crossover, mutation and evaluation but with the real and binary parts of the 
chromosomes being processed separately. A multi-objective approach was used 
to identify good solutions. A method known as non-dominated ranking was used 
in the evolutionary algorithm to allow the multi-objective problem to be handled 
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Fig. 3. FLC chromosome structure 



easily. A detailed description of the NSGA non-dominated ranking process may 
be found in [Z|, and is based on several layers of classifications of the individuals. 

To classify the individuals, the population is ranked on the basis of non- 
domination. With a single objective it is simple to distinguish between two 
solutions as one will have a better objective value than the other. If we have 
multiple objectives, solution A may be better on objective 1 but worse on all 
other objectives. We can no longer say which solution is best and so we say they 
are non-diminated. 

In the NSGA algorithm, all non dominated individuals are classified into 
one category (with a dummy fitness value, which is proportional to the popula- 
tion size, to provide an equal reproductive potential for these individuals). To 
maintain the diversity of the population, these classified individuals are shared 
using their dummy fitness values. Then this group of classified individuals is ig- 
nored and another layer of non dominated individuals is considered. The process 
continues until all individuals in the population are classified. 

The NSGA algorithm has been used instead of other algorithms such as 
MOGA 0 as when the work was started, there were no indications from the 
problem as to which algorithms may be better. Work has since shown that in 
this context, the performance of NSGA and MOGA is very similar. 

A population of 100 individuals was maintained by the algorithm. Each gener- 
ation, 20 individuals were selected using stochastic universal sampling for breed- 
ing. The choice of using 20 individuals per generation is a compromise between 
processing costs and adequate sampling of the search space. By maintaining a 
larger population of 100, some of the benefits of a larger population are main- 
tained. Grossover was performed at a rate of 0.9, with intermediate crossover 
being used for the real values and uniform multi-point crossover for the binary 
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Initial 

Population 



Population 




Fig. 4. Genetic algorithm structure 



part. A mutation rate of 2/137 was used to give on average two mutations per 
chromosome. Selective pressure (SP) of 1.7. The high crossover rate and low 
selective pressure is to slow convergence to help prevent local optimum being 
exploited. The twenty new individuals were evaluated and then concatenated to 
the old population, forming a set of 120 individuals. Non-dominated ranking was 
then applied to this set and the best 100 were taken for the next generation. 

In this application much of the feasible space of the controller is little used 
(see the results section) . The genes responsible for these areas will settle to some 
semi-random state. That is why sometimes having a very similar used control 
surfaces may have very different chromosomes. This feature upsets the sharing 
process. A token value of a share = 0.5 was used, because in this problem, varying 
o'share has little effect. 



5 Objectives 



The closed loop performance criteria are chosen as follows: 



1. Side-slip velocity steady state error: 



2. Overshoot: 



3. Rise time: 



mix) 



Er* — Er{x) 

EV'fYiax EV'fnin 



mix) = 



Os* — Os(x) 

(~) Q C 



mix) 



Tr* — Tr(x) 

TV'fjiax TV'fYiin 



(5) 

(6) 
(7) 
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4. Settling time: 




( 8 ) 



Table □ shows the reference points used in the objective calculations. 



Table 1. Closed loop performance criteria 



Steady State Error(%) Settling time(sec) Rising time(sec) Overshoot(%) 



Er* = 0.0 

EVrriax ~ 2.0 
Et min = 0.0 



max 



Ts* = 0.15 Tr* = 0.08 Os* = 4.5 
TSmax=0.25 TVmax = 0.14 OSmax = 25.0 

TSmin = 0.1 Trmin = 0.07 OSmin = 2.0 



'max 



max 



'max 



6 Results 

6.1 Lateral Acceleration Control 

In the previous work Pj velocity was controlled. In this paper, the technique has 
been applied to lateral acceleration control. As a result of the multi-objective 
optimisation multiple solutions were obtained from which the designer can choose 
one which satisfies the requirements and is preferred. 

In Fig. 0 we have shown a set of lateral acceleration responses from different 
fuzzy controllers. Some are bad with high overshoot values, very slow on rise 
time or settling time, but some are very good with almost no steady state error 
and no overshoot. 

In Figs. 0toE] we have shown three of the fuzzy gain surfaces and their cor- 
responding acceleration responses. Figure 0 which is best on steady state error, 
Fig. 0 which is within 6% error from the demand and probably not acceptable, 
although is with no overshoot and with satisfactory rise time and finally Fig. 
which has been too slow on rise time and settling time but within 3% on steady 
state error and may not be considered as an acceptable one by the designer. The 
dashed line is for the augmented acceleration which possesses almost identical 
closed loop performance criteria with the lateral acceleration. The only difference 
is in the non-minimum phase effect which can be seen in the solid line for the 
lateral acceleration. Just to remind, here the augmented acceleration has been 
used to design the nonlinear control law, but the actual lateral acceleration has 
been used as the controlled output. 

The fuzzy surface has been developed with the model exercising the nominal 
aerodynamic coefficients. Figure 0 shows the fuzzy surface of a trajectory con- 
troller generated by the evolutionary algorithm with the paths taken for the nom- 
inal (circles) case. This is the controller that delivers the smallest steady state 
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Lateral acceleration 2g demand 




Fig. 11. Acceleration Response (c), slow rise time 



error. The fuzzy logic controller has been tuned for a demand of 20[m/sec^] corre- 
sponding to 2g pull lateral acceleration and is plotted on the graphs. In a typical 
run, about 80% of the solutions in the final population are non-dominated. 

A detailed trade off surface for the individuals in a population has been 
illustrated in Fig. El and El The intention here is to show how solutions evolve 
within generations, so trade off dynamics can also be seen. Each point in each 
plot is a non-dominated solution. As there are four objectives, there are six 
possible combinations plotted. Before the convergence is achieved there are some 
bad and unwanted solutions clustered along the high scale of each objective - 
shown in Fig. El that die out as the evolution progresses. After convergence 
(approximately after 50 generations) most of the objective values are within the 
specified required range, which is an indication that solutions have converged 
towards the desired feasible area - see the scale in Fig. El There are few solutions 
strongly dominated on one objective which obviously will not been considered 
further by the decision maker. The final choice of solutions by the decision maker 
will be by applying subjective preferences for some of the objectives over others. 
For example, the steady state error is the primary objective to be minimised. 
Up to ten percent overshoot could be acceptable etc. This allows the designer to 
narrow down the set of solutions that need to be further analysed in detail. 



Settling time Settling time Settling time Settling time Settling time Settling time 
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Gen = 50 




Overshoot Error 



Fig. 12. Trade-off between each objective at generation 50 



Last gen & 250 




Overshoot Error 



Fig. 13. Trade-off between each objectives at last generation 250 
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7 Conclusions 

For the control of a non-linear missile’s lateral acceleration, we have shown that 
the evolutionary algorithms can produce a good set of results that populate the 
Pareto solution set, allowing the system designer the flexibility of trading one 
solution against others to achieve a desired performance. 

As Fig. 0 demonstrates, by monitoring the usage of the controller surface, 
combined with the fuzzy control approach, an insight can be gained into the 
operation of robust controllers that are created using the evolutionary process. 

Acknowledgements. The authors would like to thank Matra-BAE for provid- 
ing the data for the missile model. 
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Abstract. We present an evolutionary approach to a difficult, multiobjective problem 
in groundwater quality management: how to pump-and-treat (PAT) contaminated 

groundwater to remove the most contaminant at the least cost. Although evolutionary 
multiobjective (EMO) techniques have been applied successfully to monitoring of 
groundwater quality and to containment of contaminated groundwater, our work is a 
first attempt to apply EMO to the long-term (ten year) remediation of contaminated 
water. We apply an improved version of the Niched Pareto GA (NPGA 2) to 
determine the pumping rates for up to fifteen fixed-location wells. The NPGA2 uses 
Pareto-rank-based tournament selection and criteria-space niching to find non- 
dominated frontiers. With 15 well locations, the niched Pareto genetic algorithm is 
demonstrated to outperform both a single objective genetic algorithm (SGA) and 
enumerated random search (ERS) by generating a better tradeoff curve. 



1 Introduction 

Groundwater cleanup is one of the major technical and environmental challenges 
facing us at the start of the new millennium. This problem domain has vast, non-linear 
search spaces, appropriate for evolutionary optimization. And because of the large 
and diverse groups of people, animals, plants, and industries affected, the criteria for 
groundwater decision making are inherently multiple and conflicting. When faced 
with subsurface remediation management problems, decision makers must frequently 
weigh several conflicting objectives such as minimizing cost, minimizing health risk, 
minimizing cleanup time, and maximizing reliability (amount of contaminant 
captured). It would thus be of value to the decision makers to view the best tradeoffs 
between the conflicting objectives. 
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1.1 Previous Approaches 

The majority of applications of optimization tools to subsurface remediation problems 
have been based on single objective optimization methods. [1], and more recently 
[2], provide extensive reviews on the applications of optimization to groundwater 
remediation design, while [3] actually apply a single objective GA to the optimal 
remediation design problem. To date, however, we can find no application of an 
evolutionary multiobjective (EMO) algorithm to the remediation problem. But 
several recent successful applications of EMO approaches to related groundwater 
problems, namely containment and monitoring, suggest to us that EMO might be 
effective in the remediation domain. 

Eor example, [4] applied two variations of the genetic algorithm (GA), a Pareto 
GA and a vector-evaluated genetic algorithm (VEGA), to a multiobjective 
groundwater pollution containment problem. The multiobjective problem was 
formulated to minimize the design cost while maximizing the mass contained. The 
Pareto GA relied on a ranking scheme that ordered the population according to each 
containment design’s degree of domination. [4] concluded that the Pareto GA was 
superior to VEGA in finding the largest portion of the Pareto optimal solutions. 

The authors of [5] investigated the performance of four GA formulations in 
solving a multiobjective groundwater monitoring problem where they simultaneously 
maximized reliability of a monitoring system and minimized the contaminant plume 
size at time of first detection. They found that a VEGA/Pareto GA combination was 
more computationally efficient and more successful at generating the greatest portion 
of the tradeoff curve than other GA formulations. They recommended that a form of 
fitness sharing [6] be used to enhance the tradeoff curve coverage of the Pareto GA. 



1.2 Purpose and Scope 

In this work, we present an improved version of the niched Pareto genetic algorithm 
(NPGA2), a multiobjective technique originally developed by [7], and apply it to a 
hypothetical contaminated groundwater remediation scenario. The original, 1994 
version of the NPGA was successfully applied to a version of the groundwater 
monitoring problem in [5], to detect contaminant plumes. Here we use a new NPGA2 
for treatment of the plume. 

The objectives of this work are to (a) develop a EMO approach for optimal 
groundwater remediation design using the NPGA2, (b) test the NPGA2 for efficacy 
by comparing its performance to those of two other optimization approaches: single- 
objective genetic algorithm (SGA) and an enumerated random search (ERS), and (c) 
begin exploring the ability of this approach to scale up to larger versions of the 
problem. 
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2 The Niched Pareto Genetic Algorithm 

Since 1993, a number of researchers have implemented Goldberg’s 1989 [6] general 
suggestion of combining a Pareto-based selection with some form of fitness sharing to 
achieve a selective pressure toward the non-dominated front while maintaining 
diversity along that front. [7] developed the NPGA which implements the basic 
suggestion of Goldberg [6] by using Pareto domination tournaments for selection, 
and fitness sharing in objective space to introduce speciation in a partially ordered 
search space. [8]’s Multiobjective GA (MOGA) used a similar strategy, with Pareto 
rank-based selection and objective space fitness sharing, while the non-dominated 
sorting GA (NSGA) [9] used Goldberg’s [6] suggested ranking method and then used 
fitness sharing in the genotypic and phenotypic spaces (rather than in the objective 
space). Since then, other researchers have implemented their own variations on these 
similar strategies, all with relative success. Below we describe the NPGA2 and 
compare and contrast it with the original 1994 version, as well as with MOGA and 
with NSGA. 



2.1 The Original NPGA 

The extension of the traditional GA to the 1994 NPGA involved the addition of two 
specialized genetic operators: (1) Pareto domination tournaments and (2) 

continuously updated fitness sharing. These operators alter the traditional mechanism 
of selection by partially ordering the population and by maintaining diversity in the 
population through successive generations, respectively. 

In the 1994 NPGA, tournament selection was used because of its simplicity, well- 
known effectiveness, and the ability to adjust selection pressure by changing the 
tournament size. Tournaments were modified to handle the Pareto criterion in the 
following manner. k competitors were chosen at random from the current 
generation’s population. A set of individuals was also chosen at random as the 
Pareto sample set P. Each of the k competitors was compared to each member of P. 
If a competitor was dominated by a member of P it lost the competition. If all of the k 
competitors lost, then fitness sharing was used to break the “tie”. Each competitor’s 
niche count (see below) was calculated in the objective space (i.e., using its evaluated 
objective values, such as cost and reliability, as coordinates). The competitor with 
lowest niche count (i.e., the “least crowded”) won the tournament. As with normal 
tournament selection, enough tournaments were held to fill the next generation’s 
population with “winners”. Crossover and mutation could then be applied to the new 
population. 



2.2 The NPGA2 

Although the original NPGA was successfully applied to several test problems, 
including a groundwater detection problem, we felt that the Pareto domination 
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sampling used in the tournaments could be made less noisy. Indeed, the sampling can 
be seen as an approximation of Fonseca and Fleming’s [8] Pareto ranking scheme, in 
which the rank of an individual is determined by the number of current population 
members that dominate the individual. In the NPGA, the more members that 
dominate a solution, the more likely it is that a random sample of the population will 
include a dominating member. To make such Pareto ranking deterministic in the new 
NPGA2, we use the degree of domination of an individual as its rank, as [8] and as 
used by others since. But unlike [8] we maintain the use of tournament selection, 
simply using Pareto rank to determine tournament winners. As in the original NPGA, 
tournament ties are resolved by fitness sharing, using niche counts as calculated in the 
objective space. A more detailed discussion of the new NPGA2 selection and niching 
mechanisms follows. 

To initiate selection each individual in the population of designs is assigned a 
rank equal to the degree of Pareto domination experienced by that design. The degree 
of domination, or rank, of an individual design is the total number of designs in the 
population that dominate that design. A design is said to (Pareto) dominate another 
individual in the population if it is at least equal in all objectives to that individual and 
better in at least one. Non-dominated designs, or those that are not dominated by any 
individuals in the population, are assigned a rank of zero. In Fig. 1, where the 
objectives are to minimize both cost and mass remaining, an example of a Pareto 
domination ranked population of ten designs is shown. Designs of equal rank are 
designated by the same symbol. 




Fig. 1. Ranking by degree of Pareto domination 

Once the entire population has been ranked according to the Pareto domination 
rank, candidate designs are chosen for reproduction via deterministic tournament 
selection [6]. The controlling variable in tournament selection competitions is the 
tournament size k. A group of k competitors is randomly selected from the population 
of ranked designs. If there is a single candidate with the lowest rank (i.e. less 
dominated), this candidate is the “clear winner” of the tournament and is selected for 
reproduction. If there is no clear winner, none of the candidates are preferred and the 
tournament selection ends in a tie. 





The Niched Pareto Genetic Algorithm 2 685 



In the case of a tie, the population density around each competitor is calculated 
within a specified Cartesian distance (in objective function-space), called the niche 
radius (see Figure 5). A niche count is calculated by summing the number of designs 
within the niche radius of each competitor, weighted by the radial distance between 
the competitor and the other designs. The niche count for competitor i is given by: 
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( 1 ) 



where dg is the distance between competitor i and other population member j and 
<^share IS the nichc radius. The winner of the tied tournament is the competitor with the 
lowest niche count. 



As in the original NPGA, niche counts are calculated using individuals in the 
partially filled next generation population, rather than from the current generation 
population. The use of this continuously updated fitness sharing was suggested by 
[10]. They showed that the naive combination of tournament selection and fitness 
sharing would lead to chaotic perturbations of the population composition. Note that 
the values of the objective functions should be scaled to equal ranges in order to 
determine the niche count, as in: 



O' = 



Or -Or 
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where O’l, ^ and (?’,;niax are the scaled, minimum, and maximum values of 

objective (?,, respectively. 



2.3 Comparison to Non-dominated Sorting 

NPGA2 uses tournament selection based on the ranking given by [8]’s degree-of- 
domination measure, which has been implemented in a number of other EMO 
algorithms. Goldberg, in [6], suggested a different ranking scheme. Srinivas and 
Deb [9] have named this non-dominated sorting, and it has also been implemented in 
a number of other EMO algorithms. In a non-dominated sort of the population, the 
current non-dominated subset of the population is assigned rank 0 and is then 
temporarily removed from consideration. The remaining population is then evaluated 
to determine another non-dominated subset, which is then given rank 1 and removed 
from consideration. This process continues until the entire population has been 
ranked [6]. As with the degree-of-domination ranking, the non-dominated ranks can 
then be used for selection in a number of ways, such as tournaments. In the case of 
the NSGA, Srinivas and Deb use proportionate selection with fitness sharing. 

How does the ranking generated by a non-domination sort compare to the ranking 
generated by the degree-of-domination? Eirst, it is clear that both rankings treat the 
on-line Pareto optimal set (i.e., the non-dominated set from the current generation 
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population) the same: every member of the Pareto set gets a rank of zero. As for the 
rest of the population, it can be shown that both types of ranking respect the Pareto 
ordering. That is, if an individual A Pareto dominates an individual B, then both 
types of ranking will rank A better (lower) than B. In the case of non-dominated 
sorting, this is quite clear, as A must be removed from consideration before B can be 
considered non-dominated, and therefore A must be ranked before B. In the case of 
degree-of-domination ranking, B must have a greater degree of domination than A 
because every individual that dominates A must also dominate B (by the transitive 
property of the Pareto domination relation), and by assumption A dominates B as 
well. Thus B’s degree of domination must be greater than that of A. 

Non-dominated ranking can differ from degree-of-domination ranking however, 
when comparing individuals that are not ordered by the Pareto domination relation. 
Thus an individual A might be ranked better (lower) than an individual B by non- 
dominated ranking, because it is “closer” to the Pareto front, while being ranked 
worse (higher) by degree-of-domination ranking, because A is more “crowded” by 
solutions that dominate itQ 



3 The Groundwater Remediation Problem 



We apply the NPGA2 to a new problem domain: cleanup of a hypothetical 

contaminated groundwater site. The site is given an initial contaminant plume, after 
which active remediation, in the form of pump-and-treat (PAT) technology, is 
applied. A fixed number of wells in fixed locations are used to pump out 
contaminated water to be treated on the surface. The goal of the NPGA2 is to find the 
optimal pumping rates for the wells such that total cost (e.g., power to the pumps, 
volume of water treated, etc.) is minimized while also minimizing the amount of 
contaminant remaining after the treatment period (ten years). For each set of decoded 
pumping rates generated by the NPGA2, we run a computer simulation of the 
resulting groundwater and contaminant flow over the ten year remediation horizon, to 
determine a total cost of treatment and total contaminant mass remaining. 

Although our specific test cases are artificial and somewhat idealized, the overall 
approach is realistic. In real-world situations, contamination is detected, an initial 
contaminant plume is estimated (based on known mass of missing contaminant and/or 
sampling of concentrations from test wells), the hydrology flow field is modeled 
(again based on known and sampled geological conditions), and various pumping 
strategies (e.g., locations, depths, and pumping rates) are simulated. Remediation 
strategies exhibiting the best tradeoffs between total cost and contaminant mass 
remaining, under simulation, are chosen for implementation. The rest of this section 
provides details of the simulation model. 



' But this can only be tme if both individuals are not Pareto optimal, i.e., of rank 0. 
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3.1 Description of the Contaminated Site 

The site is modeled as a confined homogeneous aquifer 1000 by 1010 meters in area 
and 30 meters in thickness (see Fig. 2). The model is discretized in two dimensions 
using 10-m square grid blocks. Constant-head boundaries are imposed on the east and 
west sides of the domain while no-flow boundaries are imposed on the north and 
south. (Thus the “background flow” of groundwater is a constant and uniform east- 
west flow.) The aquifer is modeled as having a homogeneous, isotropic hydraulic 
conductivity (i.e., uniform porosity, etc.). Extraction wells are modeled as being open 
over the entire thickness of the confined aquifer. Removal of contaminant by the 
treatment system is simulated as equilibrium, nonlinear adsorption. The adsorbent 

concentration, q, is given by the Freundlich isotherm equation: q = Kj^gC/l- 

For a hypothetical contaminant, we chose Trichloroethlyene (TCE), a commonly 
observed and studied groundwater contaminant. TCE is treated as a conservative, 
dissolved species. To generate the initial plume of contaminant shown in Fig. 2, a 
constant source of approximately 750 ppm is applied until a plume of approximately 
1,000 kg of TCE is released into the confined aquifer. At that point, the source is 
removed and active remediation begins, continuing for a ten year period. The ex-situ 
treatment technology is granular activated carbon (GAC). 



no flow boundary 







O 

4 



direction 

of 

flow 

(without 

pumping) 



no flow boundary 



1000 meters 



Fig. 2. The initial contaminant plume to be remediated 
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3.2 Problem Encoding: Pumping Rates as Decision Variables 

The decision variables are the extraction rates for each well, and are encoded as fixed 
length binary strings which decode to discrete pumping rates between zero and a 
maximum to be determined, per well, by the maximum drawdown allowed. 
Drawdown is the reduction in the height of the water table below normal levels 
caused by pumping out groundwater. To avoid drawdown, we constrain the 
maximum pumping rates. 

In an effort to avoid explicit constraints and the need to formulate penalty 
functions, equations (3) and (4) are implicitly subject to a constraint that limits the 
maximum extraction rate per well. The constraint 0<2y<2™“ fory = l, 

where is the maximum total extraction rate per well allowed, ensures 

satisfaction of drawdown constraints. It is applied through the encoding/decoding of 
the decision variables, such that the maximum value of a single decision variable can 
not exceed We chose by running a series of flow simulations at various 

extraction rates and selecting as the pumping rate that results in the maximum 

allowable drawdown (15% of the aquifer thickness). 



3.3 Two Conflicting Objectives: Cost and Cleanup 

The two objectives are to minimize cost and maximize cleanup performance. The 
cost of a PAT remediation system includes capital costs incurred from installation of 
recovery wells and operational costs from pumping and groundwater treatment as in 

N„,. 

miny 

7=1 

where J is the total cost of the remedial design; is the number of active wells; 
NSTP is the number of time steps within the remediation horizon T; Qjis the pumping 
rate of well y; Hj is the total dynamic head at well y; Cy is the average flow- weighted 
concentration removed by well y in time step i ; t, is the length of time step /; Kab and 
1/n are Freundlich GAC adsorption parameters for a given contaminant and carbon 
adsorbent; and aj, 02 , and 03 are coefficients for capital, pumping, and treatment costs 
respectively. 

We use mass remaining as an indicator of cleanup performance. The mass 
remaining objective function can be formulated as: 

minMR' = 100— (4) 

MI 

where MR’ is the percent mass remaining after the remedial horizon is complete; MI 
and MR are the initial mass present in the aquifer at the beginning of the remediation 
horizon and the mass remaining at the end of the remediation horizon, respectively. 
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3.4 The Simulation 

The simulation consists of a flow and transport model of the groundwater flow and a 
particle tracker to estimate contaminant dispersal. These models are run for a 
simulated remediation period of ten years. The well pumps are run at the rates 
decoded from the chromosome, for the ten years. The effects of the pumping on the 
groundwater flow are simulated. The contaminant flow is then estimated via particle 
tracking through the resulting groundwater flows, over ten years. The contaminant 
concentrations in the pumped water can then be found, and their effects on cleanup 
costs ( 7 in equation (3) above) calculated. The second objective value, mass 
remaining (MR), is then easily computed as the difference between the initial 
contaminant mass MI and the total mass of contaminant pumped out over ten years. 
Details of the models follow. 

The state variables, hydraulic heads and contaminant concentrations, are deter- 
mined by applying a simulator for modeling groundwater flow and contaminant 
transport in a non-deforming, saturated, two-dimensional aquifer system. The steady- 
state confined groundwater flow equation is 

y{K-yh)=Y,Q]S (x-Xj,y-yj) (5) 

j 

where K is the hydraulic conductivity and Qj’ is the volumetric flux from well j 
located at Xj and yj. The hydraulic head, h, is related to the total dynamic head, H, by 
H = — (h — b) where d,op is the depth to the top of the aquifer; and b is the aquifer 

thickness. 

Contaminant concentrations are determined by solving the mass balance equation 
for a neutrally-buoyant, conservative aqueous chemical constituent, given by 

^-i-V(vC)-V(Z)-VC) = -^— (x-Xj,y-yj) (6) 

where c is the aqueous concentration in the aquifer, Cq is the aqueous concentration 
removed by well j, and n is the effective porosity. 

We employ a 2-D finite difference approximation to solve the groundwater flow 
equation (5) and a particle-tracking method to solve the mass transport equation (6). 
The numerical codes have been validated by [1 1]. Additional information pertaining 
to the development of this numerical simulator can be found in [12]. 



4 Numerical Experiments 



The purpose of the numerical experiments is to (1) investigate the performance of the 
NPGA2 as a function of the size of the search space, and (2) compare the NPGA2 
performance with two other methods for generating the tradeoff curve. We chose to 
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examine the performance of the NPGA2 with two, five, and fifteen decision variables, 
where each decision variable corresponds to a pumping rate at a fixed-location well. 
The wells used in each scenario are, referring to the well numbers in Fig. 2, wells 7 
and 8 for the two-well scenario; wells 2, 6, 7, 8, and 12 for the five-well scenario; and 
all wells for the fifteen well scenario. Table 1 compares details of the three different 
remediation scenarios. 

Table 1. Problem characteristics of two-, five-, and fifteen-well scenario scenarios 



Scenario 


Computational 
Cost Limit 
(fen. eval.s) 


Decimal 
precision in 
flowrate (m^/d) 


Bits per 
decision 
variable 


Maximu 

m 

flowrate 
per well 

(mVd) 


Search 

Space 

Size 


2 wells 


500 


2.5 


7 


250 


2(2*7) ^ 


5 wells 


1000 


2.5 


6 


100 


2(5*6) = 


15 wells 


2000 


2.5 


4 


33 


2 ( 15 * 4 ) _ _Jq18 



4.1 SGA and ERS Methods 

We compare the performance of the NPGA2 with two methods for generating the 
tradeoff curve; a single-objective GA (SGA) and enumerated random search (ERS). 

A common and straightforward approach to applying a GA to a multiobjective 
problem is to first find a way to reduce the problem to a single objective, and then to 
apply a traditional SGA to slightly different versions of this “scalarization". For our 
SGA we choose a constraint-based scalarization: minimizing cost is the objective and 
the percent mass remaining is held as a constraint. The percent mass remaining 
constraint was imposed via the multiplicative penalty approach described by [13]. 
The constrained optimization problem can be formulated as: min[y(l-i-H>Mi?')], where 
the cost function, J, is identical to the cost function in equation (3), w is the constraint 
violation weight; and MR’ is defined as equation (4). The tradeoff curve is 
constructed by executing separate SGA runs, each with a different value of the mass 
remaining constraint. 

Table 2. Optimization algorithm parameters 



Parameter ^ 


pop- 

size 


tourn. 

size 


niche 

radius 


prob. 

of 

crossover 


prob. 

of 

mut. 


Algorithm (scenario) JJ- 


NPGA2 (2 wells) 


50 


2 


0.5 


0.9 


0.001 


NPGA2 (5 wells) 


50 


2 


0.05 


0.9 


0.001 


NPGA2 ( 15 wells ) 


100 


10 


0.5 


0.9 


0.001 


SGA (all scenarios) 


50 


2 


- 


0.9 


0.001 
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□ 

To implement a SGA, we simply used the NPGA2 with only the single objective 
defined above, and no niching. The NPGA2 then reduces^ to a “traditional, simple 
GA”. The SGA thus uses the same encoding, crossover, and mutation operators (and 
parameter settings) as the NPGA2. Table 2 compares the run parameters of the 
NPGA with those of the SGA. The SGA was stopped when the entire population 
converged to equal values of the decision variables. The best solution (lowest cost) 
found during the run was used as the optimal value. We performed three separate 
SGA runs with the constraint values equally distributed (on a log scale) over the 
possible range of mass remaining. 

The ERS is a simple approach and is used here for baseline results. Under ERS, 
values of the decision variables are randomly generated from a uniform distribution. 
The tradeoff curve is formed by the Pareto optimal subset of the randomly generated 
set of decision variables. There are no other operators, such as recombination or 
mutation. To implement the ERS we simply ran the NPGA for one generation with a 
large enough population size that the entire fixed allocation of computational effort is 
used up in randomly initializing the population (i.e., in creating generation 0). 

Table 3. Share of aggregated Pareto optimal set found by each algorithm under each scenario 



Scenario ^ 


2 

wells 


5 

wells 


15 

wells 


Algorithm (scenario) JJ- 


ERS 


30% 


5 % 


0% 


SGA 


0% 


0% 


0.3 % 


NPGA2 


70% 


95 % 


99.7 % 



4.2 Performance Measures 

We compare the NGPA, SGA, and ERS in terms of (a) the Pareto optimality of the 
tradeoff curves and (b) the span of the tradeoff curves. In each numerical experiment, 
we use the same random seed. The method (NPGA2, SGA, or ERS) proceeds until a 
fixed amount of computational effort is consumed. One unit of computational effort is 
equivalent to one set of objective function evaluations, which involves one execution 
of each of the flow and transport models. One flow and transport model execution 
consumes approximately 2.5 minutes of CPU time on Sun Ultra 80 workstations with 
450 Mhz processors. 

Since the computational effort is fixed for each simulation/optimization 
experiment (see Table 1), we use as performance measures (a) the percentage of 
Pareto optimal solutions found by each method and (b) a qualitative evaluation of the 



^ Ranking by degree-of-Pareto-domination imposes a partial ordering on the search space, just 
as ranking based on non-dominated sorting does. But when the number of objectives is 
reduced to one, the partial ordering collapses to a total ordering, suitable for a SGA. 
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span of the tradeoff curve covered by these solutions. We determine the percentage of 
Pareto optimal solutions found by a method m by aggregating all of the Pareto 
optimal solutions found by the methods and calculating , where Pm is the 

total number of Pareto optimal solutions found by all methods and is the number 
of Pareto optimal solutions found by method m. The endpoints of the span are defined 
as the maximum MR’ (-100%) and the minimum MR’ achievable, given the 
maximum pumping rate constraint. 

We implement archiving to eliminate redundant objective function evaluations. 
If a design is found to have been evaluated previously, the previous evaluation is 
used. Only new designs are evaluated, so that only new designs count against 
computational effort. 




Fig. 3. Performances on the 15 well problem 



5 Results and Discussion 

Table 3 summarizes the results of all the experiments, in terms of percentage of 
Pareto optimal solutions found by each algorithm for each scenario. We aggregate all 
non-dominated solutions found by each algorithm, for a particular scenario, and take 
the Pareto optimal subset of the aggregation. We then credit each algorithm for its 
contribution to that set. Table 3 indicates that as the problem becomes harder 
(number of wells increases), simpler approaches like ERS are outperformed by the 
more sophisticated heuristics. 

If we look at the 15 well case in more detail (Fig. 3), in which each algorithm’s 
Pareto optimal set is shown plotted in objective space, we can see that the genetic 
algorithms “dominate” ERS. The SGA comes close to the aggregated optimal front 
with two of its solutions, while its third solution is actually on the front. 
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Finally, we take a look at a couple of individual solutions to the 15 well scenario, 
picked from two different parts of the aggregated Pareto optimal tradeoff curve. In 
Fig. 4, right, we graph the well pumping rates for a solution that costs less than 
$50,000 but leaves approximately 60% of the contaminant behind. Note how the 
NPGA2 found that only two wells need to be turned on, and these are directly 
downflow of the source. On the left of Fig. 4 we show contour plots illustrating the 
distribution of the remaining contaminant. In Fig. 5, we see another tradeoff solution, 
with MR’ = 15%, but this one must apparently spend more power and therefore 
money, pumping out more of the contaminant, in order to achieve a 15 percentage 
mass remaining. Fig.s 4 and 5 together demonstrate that the NPGA2 is designing 
reasonable strategies: (a) as MR’ increases, fewer wells are needed, (b) the pumping 
rates are more or less symmetric about the mean direction of groundwater flow, and 
(c) the remaining contaminant plume is mostly symmetric about the direction of flow. 





Fig. 4. NPGA2 solution for 60% MR ’ design is reasonable 





Fig. 5. NPGA2 solution for 15% MR’ design 
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6 Conclusions 

In summary, we have applied the new NPGA2 to a groundwater quality management 
problem consisting of active remediation by pump-and-treat. Through the addition of 
fitness sharing, we have overcome the problem described by [4] in finding Pareto 
optimal solutions over the entire tradeoff curve. 

To gain some confidence in the ability of the algorithm to scale with problem size 
and difficulty, we ran a series of test problems where the NPGA2 was compared to 
two other methods, SGA and ERS. As the problems increased in complexity, by 
considering additional decision variables, the NPGA2 was more effective and 
efficient than either the SGA or the ERS in finding more Pareto optimal designs that 
span the entire tradeoff curve. 

We conclude that EMO algorithms like the NPGA2 can be an effective method 
for producing tradeoff curves for subsurface remediation problems. Tradeoff curves 
such as those presented here may give decision makers the capability of making better 
informed decisions. 

In future we think we can consider additional objective functions, such as 
maximizing reliability, minimizing drawdown, or minimizing remediation time. We 
also plan to try other decision variables, such as time- variable extraction rates. 
Further tests of the applicability of this approach should consider realistic, 
contaminated sites, especially those with a significant degree of heterogeneity. Future 
comparative studies should include more sophisticated algorithms than ERS or SGA, 
such as other successful EMO methods, as well as multi-objective simulated 
annealing, and other multiobjective and classic optimization approaches. 
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Abstract. MOLeCS is a classifier system (CS) which addresses its learn- 
ing as a multiobjective task. Its aim is to develop an optimal set of rules, 
optimizing the accuracy and the generality of each rule simultaneously. 
This is achieved by considering these two goals in the rule fitness. The 
paper studies four multiobjective strategies that establish a compromise 
between accuracy and generality in different ways. The results suggest 
that including the decision maker’s preferences in the search process im- 
proves the overall performance of the obtained rule set. The paper also 
studies a third major objective: covering (the maintenance of a set of 
different rules solving together the learning problem), through different 
niching mechanisms. After a performance analysis using some bench- 
mark problems, MOLeCS is applied to a real-world categorization task: 
the diagnosis of breast cancer. 



1 Introduction 

The learning task performed by a classifier system (GS) is itself multiobjec- 
tive m it has to find a concept description, usually represented by a set of 
rules, which should be: (a) complete, (b) accurate and (c) minimum. In terms 
of classification, a rule set is complete when it covers (satisfies) all the examples, 
whereas a rule set is accurate when it classifies the examples correctly (i.e., with- 
out misclassification errors). The third objective involves minimizing the number 
of rules, in order to obtain concise and comprehensible descriptions. Another re- 
lated objective is the system’s capability to express generalizations, that is, to 
generalize all the similar examples. 

These multiple objectives are closely connected. Generalization of equivalent 
examples allows more concise representations and leads up to smaller rule sets, 
promoting covering. But these objectives are opposed to accuracy in some way. 
If the system performs an excessive generalization, the accuracy of classification 
will be degraded. Thus, optimizing these conflicting objectives simultaneously 
involves a weak equilibrium, which is difficult for a GS to reach and maintain. 

The classifier systems’ community has solved this multiobjective learning 
task in an implicit way. Holland’s GS El uses a credit allocation algorithm to 
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evaluate each rule efficiency and a mechanism to allow the formation of hierar- 
chies, although there is not an explicit pressure towards the formation of general 
rules. XCS I2HIE3 ^ classifier system which has shown a strong tendency to 
achieve the stated objectives. One of its main strengths is the definition of fit- 
ness, based on the prediction accuracy of each rule rather than on the prediction 
itself. Although it does not use an explicit bias to favour the generalization of 
rules, this is achieved by the application of the GA to environmental niches |22]. 

MOLeCS is a recent CS |2| 0 designed to solve classification tasks by super- 
vised learning. Its main contribution is the definition of the accuracy and gener- 
ality measures for each rule, and the use of multiobjective solution techniques for 
optimizing them simultaneously. In this sense, the learning task is represented 
explicitly as a multiobjective problem: the optimization of the accuracy and the 
generality of each rule. 

This paper studies, under the MOLeCS architecture, the performance of dif- 
ferent multiobjective evolutionary algorithms in achieving the learning goals, 
which can be summarized as: obtaining the minimum rule set that covers accu- 
rately all the examples. Different niching strategies are also considered in order 
to maintain a parallel set of rules and thus, achieve covering. First, we place 
the MOLeCS system into the CSs and MOEAs (multiobjective evolutionary al- 
gorithms) frameworks. In section 0 we describe MOLeCS and discuss different 
multiobjective approaches and niching methods. Next, we compare them using 
some benchmark problems often tested in the research community. In section 0 
we show the application of MOLeCS to a real classification task based on the 
prediction of breast cancer. Finally, we give our conclusions and future work. 



2 Background 

In this section, a brief overview of classifier systems is made, remarking on the 
main differences between previous CSs and MOLeCS. Next, we review the simi- 
larities and differences between a typical multiobjective evolutionary algorithm 
and a classifier system as MOLeCS. 



2.1 Classifier Systems 

Typically, learning classifier systems codify each individual as one rule, while 
the solution that must be obtained is a complete set of rules (that is, all the 
population). Two major issues arise from this approach: (a) the evaluation of 
each rule (fitness) and (b) the maintenance of a group of rules. 

The fitness evaluation method must provide a scalar measure that weighs 
the efficiency of each rule. In traditional classifier systems, this fitness measure 
was based on the payoff prediction; that is, the payoff that the classifier would 
receive from the environment if its action was selected (e.g. Holland’s CS Cl)- 
Recently, XCS 122| has migrated the fitness from the payoff prediction to the 
accuracy of the prediction, which results in better performance. Horn’s study 
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m also addresses the classifier’s accuracy, which is defined as the percentage of 
correctly classified examples over the covered training examples. 

Different niching mechanisms are proposed in the research community to 
ensure covering by the co-evolution of different rules: sharing payoff between 
active classifiers (see m), performing restricted replacement |S], or translating 
the panmictic GA to the active classifiers (match set or action set) which can 
be classified as a kind of restricted mating mm- 

The classifier system’s ability to evolve generalizations is a major issue that 
has recently received a growing interest. In XCS, the application of the GA to the 
action sets has resulted in a pressure towards generalizations. The generalization 
hypothesis m states that given two classifiers (rules) G1 and G2 (both equally 
accurate, where G2 is a generalization of Gl) the more general rule (G2) tends to 
displace the more specific one (Gl). This is due to the fact that the more general 
rule participates in more action sets, having more reproductive opportunities 
and achieving thus more copies. Frey and Slate P] proposed a GS where fitness 
was based on accuracy. A pressure towards generalization was induced using 
an “utility” measure. This was computed as the number of correctly classified 
examples over the total number of examples seen by the system. Rules with a 
low “utility” measure were deleted in order to favour those rules more used. 

From the accuracy perspective, our approach is more related to Horn’s 
study, since fitness is based on accuracy, computed as the percentage of cor- 
rectly classified examples. However, our system is taking account of the classi- 
fier generality too, which is a more complex task. Generality in MOLeGS is 
considered explicitly in the fitness evaluation stage, in contrast to XGS where 
generality is enforced in an implicit way. Frey and Slate also proposed the use 
of two different measures similar to ours, although they did not try to optimize 
them with multiobjective techniques. In this sense, the application of multiob- 
jective techniques is new and offers promising perspectives for learning systems. 
Covering in MOLeGS is induced in the replacement stage and it is based on 
restricted replacement methods. Although Michigan style classifier systems are 
incremental, we have started with a non-increment al proposal. Our main reason 
is to understand the task of the multiobjective evaluation and niching, under a 
“classic GA” scheme. The migration to an incremental version will be performed 
in a near future, with the aim of improving the computational cost. 

2.2 MOEAs and CSs 

A multiobjective evolutionary algorithm (MOEA) and a learning classifier sys- 
tem (GS) have many related points: 

— Maintenance of a group of different solutions. The solution of an MOEA is 
usually a set of many points, approximating the set of non-inferior solutions 
(Pareto optimal set). Similarly, the solution returned by a GS is a set of many 
rules, solving together a concept description. In order to find and maintain 
such multiple optima, some niching mechanism is required. MOEAs usually 
implement sharing, with the aim of obtaining a uniform distribution of the 
Pareto front or the Pareto optimal set. 
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— Evaluation of an individual. In MOEAs, each solution fitness depends on the 
other solutions, as it happens to each rule in the CSs. 

Nevertheless, we can highlight some differences: 

— Generational and Non- Generational schemes. One desirable feature in CSs is 
the on-line performance. This is promoted with a non-generational scheme, 
which introduces slight changes into the population by replacing only a small 
fraction of the population in each generation. This is not a very common 
scheme in MOEAs, although some proposals do exist HZ|. Crowding is a 
natural way of performing niching in a non-generational scheme. For that 
reason, it has widely been used in classifier systems HH, 0. 

— Solution returned hy the system. Another important issue is the solution that 
MOEAs and CSs must obtain. MOEAs usually try to find a well-distributed 
population belonging to the Pareto front (or Pareto optimal set), from where 
the decision maker (DM) can perform a selection. If we translate this objec- 
tive to MOLeCS, it should be expected to find the Pareto front corresponding 
to the generality and the accuracy goals. When we use the system in the ex- 
ploitation mode, the DM has to select the best rules from all the available 
rules in the Pareto optimal set. Which rules should be used? If we want to 
perform accurate classifications, it seems obvious to choose always the most 
accurate rules. Therefore, we can state our learning problem as a particular 
multiobjective problem (MOP) where the DM’s choices are known a priori. 
We can take profit of that by guiding the search towards the preferred areas 
of the DM. 



3 Description of the System 



Each individual in MOLeCS codifies a rule (classifier) of type: rule : condition 
action. The condition is the conjunction of tests over the problem attributes. It 
is represented by the ternary string: {0, 1, #}^, with length equal to the number 
of describing attributes. The symbol (don’t care) matches all values of an 
attribute, so it permits us to express generalizations. The action part of the rule 
is represented by the binary string: {0, 1}^ 

Each individual must have a fitness value in order to apply the appropriate 
selective pressure. This is not an easy task, since each classifier does not represent 
a complete solution to the overall problem. In fact, each rule can match a 
different number of examples and can predict those examples in different degrees 
of accuracy. We compute these two features for each rule in the population: 



generality{ri) 



# covered examples (n) 

# examples in the training set 



( 1 ) 



correctly classified examples (rd 
accuracy(r.) = # covered examples (r,) 



(2) 



If fitness is only based on accuracy the search will be biased towards accurate 
but too specific rules. This can result in an enhancement of the solution set. 
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poor covering, etc. On the contrary, basing fitness on generality will result in low 
performance (in terms of classification accuracy) . The solution is to balance these 
two characteristics (accuracy and generality) and optimize them simultaneously. 
Our hypothesis is that a multiobjective approach will lead the search towards 
general and accurate rules, resulting in a minimum, complete and accurate set 
of rules. We have tested and compared different multiobjective strategies, which 
are described in section rrn 

Once the fitness assignment phase is performed, the GA proceeds to the selec- 
tion and recombination stages. Selection is performed with stochastic universal 
sampling (SUS) p. 

As stated before, a concept description must be complete; that is, all the input 
examples must be covered. We use the term covering as the ratio of instances 
covered by the entire rule set RS to the size of the training set: 

# examples covered by RS 

covering = — ; ^ (3) 

# examples m the training set 

Promoting general classifiers is not sufficient to reach a 100% of covering. The 
genetic algorithm can tend, due to the genetic drift uni, to one general and 
accurate classifier and usually one classifier does not solve the overall problem. 
Therefore, we must enforce the co-evolution of a set of fit rules by niching mech- 
anisms. Niching in MOLeCS is performed in the replacement stage (section 1,3.211 . 

Once the system has learned, it is used under an exploit or test phase. It works 
as follows. An example coming from the test set is presented. Then, the system 
finds the matching rules and applies the fittest rule to predict the associated 
action or class. As explained before, in case of equally fit rules, the most accurate 
rule is chosen. 



3.1 Multiobjective Learning 

In the previous section, we have defined our learning as a multiobjective problem 
(MOP). Now, we formalize the concepts mentioned before. Next, we consider 
different multiobjective algorithms to solve our MOP. 



Definition 1 The MOP evaluation function, F : X —> Y maps decision vari- 
ables X = {x\, .■.,Xn) e X to vectors y = (yi...yk) e Y. In MOLeCS, the decision 
variables are the rules, while the objective vectors, of dimensionality k=2, are 
of type: y = F{x) = (/i(x), / 2 (x)), where /i(x) = accuracyfx) computed from 
equation and / 2 (x) = generality fx) , from equation 0). 



Definition 2 Our multiobjective learning problem is defined as follows: 
Maximize y = F{x) = (accuracy{x) , generality (x)) 

where x is the decision vector, and accuracy{x) and generalityfx) are de- 
scribed by equations 0 and 0 respectively. 
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There are several MOEA techniques 0, m- We have tested and compared 
four different algorithms, representative of three main algorithmic approaches Q 
a Pareto-approach, a non-Pareto approach and a plain aggregating approach. 



Pareto-based Approach. We consider the Pareto approach proposed by Gold- 
berg j0|, which consists of ranking the population into non-dominated sets and 
then, assigns fitness according to this rank. 

The Pareto approach gives the same selective pressure to non-dominated 
objective vectors. Suppose we have two objective vectors: yi = (1,0.6) and 
Y 2 = (0.5, 0.9), being non-dominated as depicted in figure In this case, the 
Pareto approach assigns them the same rank. This means that the final solution 
set returned by the system can contain overgeneral rules (e.g., vector y 2 ) as well 
as maximally general rulefl (e.g., vector yi). Nevertheless, in the exploitation 
phase, these overgeneral rules will not be selected, because they can degrade 
the classification performance. The decision maker will always choose the best 
accurate rules (i.e. the maximally general rules). 

Our hypothesis is that these overgeneral rules do not contribute significantly 
to our final solution. Indeed, they can degrade our search towards an accurate 
rule set, because they consume resources from the population and thus, they 
may prevent other accurate rules from being explored. 

If we know the decision preferences in advance, we may guide the GA more 
efficiently towards these preferences. For that reason, we have designed a mod- 
ification of the Pareto approach which gives a bias towards accurate rules. As 
shown in figure |iTF)| inside each group of non-dominated classifiers a second level 
of ranking is performed, based on the accuracy of classifiers. In the following, 
we will refer to the Pareto original approach as PR (Pareto ranking) and to the 
modified algorithm with the accuracy bias as PAR (Pareto-accuracy ranking). 



Population-based non-Pareto Approach. Using the idea of promoting the 
most accurate areas, we have designed a population-ranking method, based on 
the lexicographic ordering. The algorithm ranks the population according to the 
accuracy objective. When two or more individuals equally accurate are found, 
they are ordered by the generality objective. In this way, we state that the 
first goal to be achieved is accuracy (in order to obtain accurate classifiers) 
and second, these classifiers must be as general as possible. An example of this 
ranking, which we term accuracy-generality ranking (AGR), is depicted in figure 

m 

^ We consider here the MOEA classification scheme made by Fonseca and Fleming in 

0 

^ Kovacs d defines a maximally general rule as an accurate rule (accuracy=1.0) 
which cannot be more general without becoming inaccurate. A rule being inaccurate 
due to excessive generalizations, is called an overgeneral rule. A suboptimally general 
rule is an accurate rule that can be more general (have more ’#’) without losing its 
accuracy. 
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accuracy 



0.0 0.2 0.4 0.6 0.8 

accuracy 




0.0 0.2 0.4 0.6 0.8 1.0 

accuracy 



(a) PR 



(b) PAR (c) AGR 



Fig. 1. Multiobjective evaluation methods: Pareto ranking (PR), Pareto-accuracy 
ranking (PAR) and accuracy-generality ranking (AGR). 

Plain Aggregating Approach. The weighted sum (WS) approach weighs 
and sums up all the objectives obtaining directly a scalar fitness value. Our 
multiobjective problem is then solved as: max '‘JJiVi where w is the weight 
vector that must be set depending on the relative importance of the objectives. 
When we test MOLeCS with well-known problems (see section , we can tune 
these coefficients properly. In this case, we use the WS as a bound from where 
the other algorithms can be compared. 



A Priori and A Posteriori Preference Articulation. From the DM’s per- 
spective, PAR, AGR and WS can be classified as a priori preference articulation 
methods m. because they use the decision preferences in the fitness/selection 
stage. Our preferences are: “those accurate rules being as general as possible” 
or in other words, the maximally general rules. 

On the contrary, the Pareto based approach (PR) does not use these pref- 
erences until the exploitation phase, when the learning process has finished. 
Therefore, it searches for a compromise between the accuracy and the generality 
goals. From the resulting set of non-dominated solutions, the DM has to select 
the maximally general rules a posteriori. 

In sections 0 and 0 we will test if the methods based on a priori prefer- 
ence articulation can outperform the solution given by a posteriori preference 
articulation method as the Pareto approach. 



3.2 Niching 

Niching methods are the key point for classifier systems to evolve a population of 
diverse rules. We enforce niching in the replacement stage using crowding strate- 
gies. We have analysed crowding, two variants of crowding including selective 
pressure and deterministic crowding. 

Crowding (or crowding factor model, CF) was introduced by De Jong j^j. 
The algorithm tries to preserve the diversity of population, by replacing each 
new individual (from the offspring population) by a similar one in the parents 
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population. To be exact, the new individual is compared to a subpopulation of 
c/ members and the member with the highest similarity is replaced. 

In order to induce a convergence pressure in the CF model, two variants are 
tested: GIF and CC. Both try to replace a “low fitness and similar individual”. 
The former differs from the CF model in the selection of the subpopulation 
of cf members. Instead of selecting them randomly, the selection is performed 
with a probability inversely proportional to fitness. The second variant, which 
we term CC, was used in the Simple Classifier System (SCS) p. The method 
consists of selecting each member going to the cf-subpopulation from a bucket 
of csp individuals. The worst individual from the csp-bucket is inserted into the 
cf-subpopulation. Then, the algorithm proceeds as the CF model. 

Deterministic crowding (DC) makes a competition between each pair of 
{parent, offspring}, choosing the competing pairs with a minimum-distance cri- 
terion HS|. The child only replaces its parent when its fitness is greater. 

4 Experimental Results 

4.1 Design of Experiments 

We first analyse the learning performance of the system with two well-known 
learning tasks, usually tested in the CSs’ community EH, cni: the multiplexer 
problem and the parity problem. This election is made for several reasons. First, 
for the simplicity to test our system, since the desired solution is known. Second, 
because they represent two different types of problems. The multiplexer allows 
generalizations in its rules; so the bias towards generality is desirable for the 
achievement of a minimal set. On the contrary, the parity problem does not 
need any generality pressure, since all the rules required to describe the problem 
must be specific. In this sense, we will test the ability of MOLeCS (and its 
multiobjective algorithms) to scale to the different levels of generalization. 

The results are shown for the multiplexer with 11 inputs (11-mux) and the 
parity problem with 5 inputs (5-par). Each problem is tested with: four multiob- 
jective strategies (PR, PAR, AGR and WS) and four different niching methods 
(CF, GIF, CC and DC). We also compare our results with a single-objective EA, 
optimizing only accuracy (i.e., using the WS approach with w = (1, 0)). 

Each result is the average of five runs using different seed numbers. The pa- 
rameters settings for the 11-mux problem are: population size = 800, Pgen=0.3 
(probability of generalization in the initialization of population and in the mu- 
tation operator), G=0.2 (generation gap), Pc = 0.9 (probability of crossover), 
Pm=0.01 (probability of mutation per gene). If DC is used, Pc = 1.0 according 
to the algorithm definition. The crowding methods CF, CC and GIF require 
the subpopulation sizes. They are tuned previously, and here we only show the 
results obtained with the best parameters (whose values are reported in the cor- 
responding figures). In case of the 5-par problem, the parameters settings are 
the same, except for the population size that is 250. 
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4.2 Metrics of Performance 

In order to test the learning performance of the system, we will use the following 
metrics: 

— Covering. This is the ratio of training instances covered by the population, 
as defined in equation This measure is related to the ability of the 
niching methods to maintain multiple rules. Covering can also be improved 
if generalizations are found. 

— Accuracy. Two metrics of accuracy can be performed on the overall rule set. 
The first one, termed as crude accuracy (CA) H2I, is defined as the number of 
correctly classified examples over the covered examples. The second measure, 
or corrected crude accuracy (CCA), is the ratio between the correct classified 
examples and the total number of examples presented to the system. 

— Size of the solution set. One desirable goal in learning is to minimize the size 
of the solution set. If the set is small, it is more explanatory and easier to 
understand by the human experts. We will consider the size of the solution 
set as the number of different rules. 

— Optimal Population. Kovacs d defines an optimal population (denoted 
as [O]) as having three characteristics: it is complete, non-overlapping and 
minimal. In case of the 6-multiplexer problem, the optimal population in 
MOLeCS consists of 8 rules, of type: 01#0##:0. In the 1 1-multiplexer prob- 
lem, [O] consists of 16 rules, while the 5-parity problem needs 32 rules. This 
measure is useful for testing if the developed rules have reached the optimal 
generality, in comparison to accuracy and covering that do not necessary 
give us this information. In the multiplexer problem, reaching a 100% of 
accuracy and covering does not imply directly that [O] has been reached. 

— Learning speed. Although the speed measure is more important in the test 
epoch (exploit phase), the speed in the training epoch is also desirable, spe- 
cially when the system has to learn from real-world applications. 



4.3 Results 

Figure 0 shows a summary of the results obtained in the 11-multiplexer problem. 
A graph represents a fixed niching method, and inside each one there is a curve 
for each multiobjective strategy. The curves plot the CCA (corrected crude ac- 
curacy), which is measured over all the training examples. Covering is not shown 
since all methods achieved the 100%. The WS approach was previously tuned 
to w = (0.75, 0.25). 

The main differences arise between the crowding methods. CF is the method 
with the worst accuracy, which ranges from 0.70 to 0.80 (see figure |2(a)| ) . Adding 
a selective pressure in the replacement stage improves the performance. This 
happens specially with CC -see figure where accuracy is about 0.90. These 
results are obtained for subpopulation sizes of 30/30. Nevertheless, this method is 
very sensitive to the parameter settings. When the size of the csp-subpopulation 
raises up, increasing the selective pressure, the accuracy (not reported here) 
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iterations iterations 



(a) CF, with cf=30 



(b) GIF, with cf=30 





iterations 



iterations 



(c) CC, with csp/cf=30/30 



(d) DC 



Fig. 2. Results in the 11-multiplexer problem. Comparison between four different 
crowding methods: CF, CIF, CC and DC. Each niching method is shown for each mul- 
tiobjective evaluation method: PR, PAR, AGR and WS with w = (0.75, 0.25). Curves 
show the corrected accuracy average over five runs, traced along 1000 iterations. 

decreases. The results achieved with deterministic crowding -figure |2(d)f - out- 
perform all previous results. Accuracy reaches 1.0 in the early generations. The 
method balances appropriately the selective pressure and the maintenance of 
niches, reaching the optimal performance. These results are consistent with other 
niching studies which demonstrate the superiority of DC on different test prob- 
lems [rnj . 

When the appropriate crowding method is used, there are no significant dif- 
ferences between the four multiobjective algorithms, in terms of accuracy and 
speed performance. 

In the 5-par problem, the behaviour of the different niching methods is similar 
to the 11-mux problem (see figure E|). What is important to mention here is 
the difference in performance that arises between some multiobjective methods. 
Figure |3(d)| shows that the Pareto approach has the poorest accuracy (with a 
value of 0.78). This is because PR does not establish any preference towards 
the rule accuracy, but towards a compromise between generality and accuracy. 
Thus, PR seeks for overgeneral rules as well as for maximally general rules. 
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iterations iterations 



(a) GF, with cf=10 



(b) GIF, with cf=10 




iterations iterations 



(c) GG, with csp/cf=10/10 



(d) DG 



Fig. 3. Results in the 5-parity problem. Gomparison between four different crowding 
methods: CF, GIF, CG and DG. Each niching method is shown for each multiobjective 
evaluation method: PR, PAR, AGR and WS with w = (1,0). Curves show the corrected 
accuracy average over five runs, traced along 2000 iterations. 



These results confirm our hypothesis about the presence of overgeneral rules 
in the population. They do not contribute to the expected solution, preventing 
other desirable accurate rules from being explored. Adding a preference towards 
accurate rules, as PAR and AGR do, improves the previous results. WS has 
the best speed, although its application depends on the appropriate knowledge 
about the weight vector settings. In these results, w = (1,0), which is the same 
as a single objective optimization towards accuracy. 

Table ID reports the different performance measures obtained from the final 
rule set (after learning is performed). In the 11-mux problem, all the multiob- 
jective approaches have achieved the same performance, except for PR which 
presents the highest rule set size (217). The last row in the table shows the re- 
sults obtained by a single-objective learning (SO), optimizing accuracy. Covering 
achieves only 93% of the examples, while the final rule set is much more complex 
than the other approaches (with 791 different rules) and the optimal population 
is not reached at all. 
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Table 1. Results in the 11-mux and 5-par problems, snmmarized for the different 
multiobjective strategies and compared to a single optimization method (row labeled 
as SO). DC is used as the crowding method. In case of the 11-mux, w = (0.75, 0.25). In 
case of 5-par problem, w = (1,0), which is the same as the single-objective algorithm. 
Cov is covering, CCA is the corrected accuracy. Size is the number of different rules 
and %[ 0 ] is the percentage of the optimal population reached by the final rule set. 
These measures are computed from the final rule set (obtained when the training epoch 
has finished). 





11-multiplexer 


5-parity 




Cov 


CCA 
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PR 


1 


1 


217 


1 


1 


0.78 
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0.60 


PAR 


1 
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40 


1 


1 


0.99 


34 


0.99 


AGR 


1 


1 


40 


1 


1 


0.99 


34 


0.99 


WS 


1 


1 


40 


1 


1 


0.99 


32 


0.98 


SO 


0.93 


0.93 


791 


0 



In the 5-par problem, PR has converged with 0.78 of accuracy and only 
60% of the expected optimal population. In fact, the population has collapsed 
to the non-inferior solutions, which represent only two points in the objective 
space. The first one, with objective vector yi = (0.03,1), corresponds to rules 
of type: 00010 : 1. The second one, with vector y 2 = (0.5, 0.5), corresponds to 
the overgeneral rules: ##### : 1 and ##### : 0. Once the population has 
converged to these points, it is very difficult to increase the number of specific 
rules, because of the presence of too many #’s in the population schemata. This 
is also the reason for such a small set size. 

5 Application to a Real-World Classification Problem 

In this section, we apply MOLeCS to the Wisconsin breast cancer database, ob- 
tained from the UCI repository The database contains 699 instances, with 9 
numerical attributes ranging from 1 to 10, and two classes (benign or malignant). 
There are 16 instances with missing attribute values. The class distributions are 
unbalanced, having 458 (65.5%) benign and 241 (34.5%) malignant instances. 

As the describing attributes are numerical and not binary, we must consider 
again our rule representation. First, we can discretize each numerical feature 
into a string of bits. This allows us to maintain our binary representation in 
the rules. The second possibility is to represent a rule as a set of real- valued 
intervals, as proposed by Wilson in 123]. We have implemented and tested both 
representations, without significant differences for the Wisconsin database. 

Each experiment is averaged for five different seed numbers. Accuracy is esti- 
mated using ten- fold cross-validation (for details see PD])- The results reported 
in table O show the covering, the crude accuracy and the corrected accuracy, 
measured on test sets. MOLeCS is run with DC and the four multiobjective 
strategies. We also ran the single-objective optimization algorithm, obtaining a 
corrected accuracy of 0.90. PR achieved a result of 0.65, while PAR and AGR 
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Table 2. Results using the Wisconsin database, obtained with a ten-fold cross- 
validation experiment. We compare the four multiobjective strategies and the single- 
objective algorithm, using DC as crowding method. The table reports: Cov (ratio of 
covered examples), CA (crude accuracy) and CCA (corrected crude accuracy). 





Gov 


GA 


GCA 


PR 


1 


0.65 


0.65 


PAR 


0.98 


0.97 


0.95 


AGR 


0.98 


0.97 


0.95 


WS (.75,. 25) 


0.98 


0.94 


0.92 


SO 


0.94 


0.95 


0.90 



reached the maximum value, with 0.95 of accuracy. This confirms the results ob- 
tained with the multiplexer and parity problems. First, it seems suitable to apply 
multiobjective methods in order to optimize each rule accuracy and generality. 
Enforcing only the accuracy leads the system to develop a high number of specific 
rules. This makes the learning more difficult, because more rules are needed to 
describe the problem. SO resulted in less covering and even less crude accuracy 
than PAR and AGR. If we give the same pressure (or preference) to accuracy 
than to generality (as PR does) we degrade the final rule set accuracy, as hap- 
pened with 5-par problem. Therefore, the best learning performance is achieved 
when we optimize generality and accuracy, but with a preference towards accu- 
racy as it is implemented by the methods PAR and AGR. The obtained accuracy 
is of 0.95 ± 0.016 with a 95% confidence interval, which is comparable to other 
learning classifier systems. XGS 121 also reached an accuracy of 0.95 in a similar 
experiment using the same database. 



6 Conclusions and Future Work 

This paper has studied the performance of MOLeGS using different MOEA tech- 
niques. The results are compared to a single-objective EA optimizing only the 
accuracy goal, in order to prove the suitability of the multiobjective approach. 
The experiments with single-objective optimization demonstrate that the system 
evolves too many specific rules. This produces an enhancement of the solution 
set, making the learning more difficult and achieving poor covering. If we opti- 
mize the accuracy and generality of each rule, we improve the learning perfor- 
mance. Nevertheless, giving the same importance to these attributes (as Pareto 
ranking does) makes the system evolve overgeneral rules in the search process, 
preventing other maximally general rules from being explored and maintained. 
The overall accuracy of the final rule set is thus degraded. In this sense, includ- 
ing the decision preferences in the search (e.g., with PAR or AGR) leads up 
to a better achievement of the learning goals. The results with the Wisconsin 
database have reached an accuracy of 0.95, performing as well as XGS. 
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The use of niching methods in MOLeCS is necessary to ensure the mainte- 
nance of a set of rules that covers the examples. The paper has studied different 
crowding algorithms under a non-generational scheme. In terms of covering and 
stability the best results are obtained with deterministic crowding. 

As a future work, it is necessary to perform further investigation on the 
applicability of MOLeCS to real world databases. We can test the MOLeCS 
performance on more complex problems, having more describing attributes, per- 
forming multiple categorization rather than binary, etc. 

When we deal with medical databases with two unbalanced classes, it is 
interesting to distinguish between the correct predictions made by the system 
when the true decision is “benign” and the correct predictions when the true 
decision is “malignant” . In this case, the accuracy measure does not give enough 
information. Other measures as sensitivity, specificity and area under the ROC 
curve must be included in our further analysis. 

Another important future research with MOLeCS is to study our approach 
with problems with highly unbalanced classes. Giving a pressure towards gen- 
eralization might displace specific rules (that cover few examples from a cer- 
tain class) by other general rules (covering examples from other more numerous 
classes). This can be prevented in the replacement stage or by measuring the 
generality of each rule relatively to its niche. 
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